jmlr jmlr2010 jmlr2010-80 knowledge-graph by maker-knowledge-mining

80 jmlr-2010-On-Line Sequential Bin Packing

Source: pdf

Author: András György, Gábor Lugosi, György Ottucsàk

Abstract: We consider a sequential version of the classical bin packing problem in which items are received one by one. Before the size of the next item is revealed, the decision maker needs to decide whether the next item is packed in the currently open bin or the bin is closed and a new bin is opened. If the new item does not ﬁt, it is lost. If a bin is closed, the remaining free space in the bin accounts for a loss. The goal of the decision maker is to minimize the loss accumulated over n periods. We present an algorithm that has a cumulative loss not much larger than any strategy in a ﬁnite class of reference strategies for any sequence of items. Special attention is payed to reference strategies that use a ﬁxed threshold at each step to decide whether a new bin is opened. Some positive and negative results are presented for this case. Keywords: bin packing, on-line learning, prediction with expert advice

Reference: text

Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 Before the size of the next item is revealed, the decision maker needs to decide whether the next item is packed in the currently open bin or the bin is closed and a new bin is opened. [sent-8, score-2.113]

2 If a bin is closed, the remaining free space in the bin accounts for a loss. [sent-10, score-1.021]

3 The goal of the decision maker is to minimize the loss accumulated over n periods. [sent-11, score-0.19]

4 We present an algorithm that has a cumulative loss not much larger than any strategy in a ﬁnite class of reference strategies for any sequence of items. [sent-12, score-0.269]

5 Special attention is payed to reference strategies that use a ﬁxed threshold at each step to decide whether a new bin is opened. [sent-13, score-0.565]

6 Keywords: bin packing, on-line learning, prediction with expert advice 1. [sent-15, score-0.763]

7 Introduction In the classical off-line bin packing problem, an algorithm receives items (also called pieces) of size x1 , x2 , . [sent-16, score-1.224]

8 We have an inﬁnite number of bins, each with capacity 1, and every item is to be assigned to a bin. [sent-20, score-0.301]

9 Further, the sum of the sizes of the items (also denoted by xt ) assigned to any bin cannot exceed its capacity. [sent-21, score-0.932]

10 A bin is empty if no item is assigned to it, otherwise, it is used. [sent-22, score-0.816]

11 Another well-studied version of the problem is the so-called on-line bin packing problem. [sent-26, score-1.034]

12 Here items arrive one by one and each item xt must be assigned to a bin (with free space at least xt ) c 2010 Andr´ s Gy¨ rgy, G´ bor Lugosi and Gy¨ rgy Ottucs´ k. [sent-27, score-1.473]

13 In this paper we abandon this assumption and introduce a more restricted version that we call sequential bin packing. [sent-31, score-0.655]

14 , in where it = 0 means that the next item is assigned to the open bin and it = 1 means that a new bin is opened and the next item is assigned to that bin. [sent-36, score-1.618]

15 Of course, if it = 0, then it may happen that the item xt does not ﬁt in the open bin. [sent-37, score-0.489]

16 ” If the decision is it = 1 then the remaining empty space in the last closed bin is counted as a loss. [sent-39, score-0.58]

17 The measure of performance we use is the total sum of all lost items and wasted empty space. [sent-40, score-0.279]

18 Just as in the original bin packing problem, we may distinguish off-line and on-line versions of the sequential bin packing problem. [sent-41, score-2.229]

19 In the off-line sequential bin packing problem the entire sequence x1 , . [sent-42, score-1.239]

20 Note that unlike in the classical bin packing problem, the order of the items is relevant. [sent-46, score-1.209]

21 In Section 3 we present a simple algorithm with running time of O(n2 ) that minimizes the total loss in the off-line sequential bin packing problem. [sent-48, score-1.273]

22 Much more interesting is the on-line variant of the sequential bin packing problem. [sent-49, score-1.195]

23 Here the items xt are revealed one by one, after the corresponding decision it has been made. [sent-50, score-0.454]

24 However, unlike in standard formulations of on-line prediction, here the loss the predictor suffers depends not only on the outcome xt and decision it but also on the “state” deﬁned by the fullness of the open bin. [sent-53, score-0.363]

25 Our goal is to extend the usual bin packing problems to situations in which one can handle only one bin at a time, and items must be processed immediately so they cannot wait for bin changes. [sent-54, score-2.212]

26 To motivate the on-line sequential model, one may imagine a simple revenue management problem in which a decision maker has a unit storage capacity at his disposal. [sent-55, score-0.322]

27 Arguably the most natural comparison class contains all algorithms that use a ﬁxed threshold to decide whether a new bin is opened. [sent-67, score-0.511]

28 An expert with parameter p simply decides to open a new bin whenever the remaining free space in the open bin is less than p. [sent-69, score-1.265]

29 We also offer some data-dependent bounds for an algorithm designed to compete with the best of all constant-threshold strategies, and show that if item sizes are jittered with a certain noise then a uniform regret bound of the order of n2/3 ln1/3 n may be achieved . [sent-73, score-0.337]

30 The principal difﬁculty of the problem lies in the fact that each action of the decision maker takes the problem in a new “state” (determined by the remaining empty space in the open bin) which has an effect on future losses. [sent-74, score-0.282]

31 Moreover, the state of the algorithm is typically different from the state of the experts which makes comparison difﬁcult. [sent-75, score-0.228]

32 (2002) considered a similar setup in which the loss function has a “memory,” that is, the loss of a predictor depends on the loss of past actions. [sent-77, score-0.189]

33 However, the special properties of the bin packing problem make it possible to design a prediction strategy with small regret. [sent-87, score-1.052]

34 We also mention here the similar on-line bin packing with rejection problem where the algorithm has an opportunity to reject some items and the loss function is the sum of the number of the used bins and the “costs” of the rejected items, see He and D´ sa (2005). [sent-94, score-1.434]

35 Then the cumulative loss of the optimal off-line bin packing is 0 and it is 0. [sent-106, score-1.19]

36 4 in the case of sequential off-line bin packing (see Figure 1). [sent-107, score-1.195]

37 In sequential bin packing we assume that the cost of the items coincides with their size. [sent-111, score-1.37]

38 3 b) sequential off-line Figure 1: The difference between the optimal solutions for the off-line and sequential off-line problems. [sent-128, score-0.322]

39 In Section 3 the complexity of the off-line sequential bin packing problem is analyzed. [sent-131, score-1.195]

40 Setup We use a terminology borrowed from the theory of on-line prediction with expert advice. [sent-134, score-0.191]

41 Thus, we call the sequential decisions of the on-line algorithm predictions and we use forecaster as a synonym for algorithm. [sent-135, score-0.354]

42 We denote by It ∈ {0, 1} the action of the forecaster at time t (i. [sent-136, score-0.21]

43 Action 0 means that the next item will be assigned to the open bin and action 1 represents the fact that a new bin is opened and the next item is assigned to the next empty bin. [sent-139, score-1.744]

44 Denote by st ∈ [0, 1) the free space in the open (last) bin at time t ≥ 1, that is, after having placed the items x1 , x2 , . [sent-142, score-0.966]

45 More precisely, the state of the forecaster is deﬁned, recursively, as follows: As at the beginning we have an empty bin, s0 = 1. [sent-147, score-0.226]

46 , n, • st = 1 − xt , when the algorithm assigns the item to the next empty bin (i. [sent-151, score-1.237]

47 , It = 1); • st = st−1 , when the assigned item does not ﬁt in the open bin (i. [sent-153, score-1.051]

48 , It = 0 and st−1 < xt ); • st = st−1 − xt , when the assigned item ﬁts in the open bin (i. [sent-155, score-1.483]

49 This may be written in a more compact form: st = st (It , xt , st−1 ) = It (1 − xt ) + (1 − It )(st−1 − I{st−1 ≥xt } xt ) where I{·} denotes the indicator function of the event in brackets, that is, it equals 1 if the event is true and 0 otherwise. [sent-158, score-1.122]

50 The loss suffered by the forecaster at round t is ℓ(It , xt | st−1 ), 92 O N - LINE S EQUENTIAL B IN PACKING where the loss function ℓ is deﬁned by 0, if s ≥ x; ℓ(0, x | s) = x, otherwise (1) and ℓ(1, x | s) = s . [sent-159, score-0.524]

51 (2) The goal of the forecaster is to minimize its cumulative loss deﬁned by t Lt = LIt ,t = ∑ ℓ(Is , xs | ss−1 ) . [sent-160, score-0.301]

52 n In ∈{0,1}n In the on-line version of the problem the forecaster does not know the size of the next items, and the sequence of items can be completely arbitrary. [sent-165, score-0.349]

53 Since we allow the forecaster to randomize, it is important to clarify that the entire sequence of items are determined before the forecaster starts making decisions, that is, x1 , . [sent-173, score-0.479]

54 ) The performance of a sequential on-line algorithm is measured by its cumulative loss. [sent-178, score-0.269]

55 (This is in contrast with the non-sequential counterpart of the bin packing problem in which there exist on-line algorithms for which the number of used bins is within a constant factor of that of the optimal solution, see Seiden 2002. [sent-181, score-1.118]

56 ) So in order to measure the performance of a sequential on-line algorithm in a meaningful way, we adopt an approach extensively used in on-line prediction (the so-called “experts” framework). [sent-182, score-0.194]

57 The performance of the algorithm is evaluated relative to this set of experts, and the goal is to perform asymptotically as well as the best expert from the reference class. [sent-184, score-0.24]

58 Formally, let fE,t ∈ {0, 1} be the decision of an expert E at round t, where E ∈ E and E is the set of the experts. [sent-185, score-0.265]

59 Similarly, we denote the state of expert E with sE,t after the t-th item has been revealed. [sent-187, score-0.452]

60 Then the loss of expert E at round t is ℓ( fE,t , xt | sE,t−1 ) and the cumulative loss of expert E is n LE,n = ∑ ℓ( fE,t , xt | sE,t−1 ). [sent-188, score-1.049]

61 t=1 93 ¨ ´ G Y ORGY, L UGOSI AND OTTUCS AK S EQUENTIAL ON - LINE BIN PACKING PROBLEM WITH EXPERT ADVICE Parameters: set E of experts, state space S = [0, 1), action space A = {0, 1}, nonnegative loss function ℓ : (A × (0, 1]|S ) → [0, 1), number n of items. [sent-189, score-0.193]

62 , n, (a) each expert forms its action fE,t ∈ A ; (b) the forecaster observes the actions of the experts and forms its own decision It ∈ A ; (c) the next item xt ∈ (0, 1] is revealed; (d) the algorithm incurs loss ℓ(It , xt | st−1 ) and each expert E ∈ E incurs loss ℓ( fE,t , xt | sE,t−1 ). [sent-194, score-1.865]

63 Figure 2: Sequential on-line bin packing problem with expert advice. [sent-196, score-1.207]

64 The goal of the algorithm is to perform almost as well as the best expert from the reference class E (determined in hindsight). [sent-197, score-0.24]

65 The model of sequential on-line bin packing with expert advice is given in Figure 2. [sent-200, score-1.446]

66 In Sections 4 and 5 we design sequential on-line bin packing algorithms. [sent-201, score-1.195]

67 However, we show that if the item sizes are jittered with certain noise, the regret of the algorithm vanishes uniformly regardless of the original sequence of items. [sent-208, score-0.381]

68 Sequential Off-line Bin Packing As it is well known, most variants of the bin packing problem are NP-hard, including bin packing with rejection, see He and D´ sa (2005), and maximum resource bin packing, see Boyar et al. [sent-211, score-2.58]

69 o 94 O N - LINE S EQUENTIAL B IN PACKING In this section we show that the sequential bin packing problem is signiﬁcantly easier. [sent-213, score-1.195]

70 The key property is that after the t-th item has been received, the 2t possible sequences of decisions cannot lead to more than t different states. [sent-215, score-0.277]

71 Lemma 1 For any ﬁxed sequence of items x1 , x2 , . [sent-216, score-0.219]

72 , xn and for every 1 ≤ t ≤ n, |St | ≤ t, where St = {s : s = sIt ,t , It ∈ {0, 1}t } and sIt ,t is the state reached after receiving items x1 , . [sent-219, score-0.242]

73 At time t, the state of every sequence of decisions with It = 0 belongs to the set St′ = {s′ : s′ = s − I{s≥xt } xt , s ∈ St−1 } and the state of those with It = 1 becomes 1 − xt . [sent-226, score-0.624]

74 To describe a computationally efﬁcient algorithm to compute I∗ , we set up a graph with the set n of possible states as a vertex set (there are O(n2 ) of them by Lemma 1) and we show that the shortest path on this graph yields the optimal solution of the sequential off-line bin packing problem. [sent-228, score-1.287]

75 Each vertex vk = v(sk ,tk ) of the graph is deﬁned by a time index tk and a state sk ∈ Stk and corresponds to state sk reachable after tk steps. [sent-236, score-0.26]

76 Two vertices (vi , v j ) are connected by an edge if and only if vi ∈ St−1 , v j ∈ St and state v j is reachable from state vi . [sent-238, score-0.201]

77 That is, by choosing either action 0 or action 1 in state vi , the new state becomes v j after item xt has been placed. [sent-239, score-0.734]

78 Thus, if we ﬁnd a path with minimal total weight from v1 to v|V | , we also ﬁnd the optimal sequence of actions for the off-line bin packing problem. [sent-248, score-1.145]

79 0/ ℓ(0 ,x 2 |s 3) Figure 3: The graph corresponding to the off-line sequential bin packing problem. [sent-265, score-1.195]

80 Sequential On-line Bin Packing In this section we study the sequential on-line bin packing problem with expert advice, as described in Section 2. [sent-267, score-1.368]

81 We construct a randomized algorithm that, with large probability, achieves a cumulative loss not larger than that of the best expert plus O(n2/3 ln1/3 N) where N = |E | is the number of experts. [sent-272, score-0.371]

82 It shows that in sequential on-line bin packing the cumulative loss is not sensitive to the initial states in the sense that the cumulative loss depends on the initial state in a minor way. [sent-274, score-1.576]

83 Then 0 m m t=1 t=1 ′ ∑ ℓ(it , xt | st−1 ) − ∑ ℓ(it , xt | st−1 ) ≤ s′ + s0 ≤ 2 . [sent-296, score-0.432]

85 m O N - LINE S EQUENTIAL B IN PACKING Now using the deﬁnition of the loss (see Equations 1 and 2), we write m m ′ ∑ ℓ(it , xt | st−1 ) − ∑ ℓ(it , xt | st−1 ) t=1 t=1 = m′ −1 ∑ xt (I{s ′ t−1 < s0 , and m′ = 2. [sent-300, score-0.711]

86 Then 0 m m ′ ∑ ℓ(it , xt | st−1 ) − ∑ ℓ(it , xt | st−1 ) t=1 t=1 = ℓ(0, x1 | s′ ) + ℓ(1, x2 | s′ ) − ℓ(0, x1 | s0 ) + ℓ(1, x2 | s1 ) 0 1 = ℓ(0, s0 | s′ ) + ℓ(1, x2 | s′ ) − ℓ(0, s0 | s0 ) + ℓ(1, x2 | 0) 0 0 = s0 + s′ − (0 + 0) . [sent-301, score-0.432]

87 0 Now we consider the on-line sequential bin packing problem when the goal of the algorithm is to keep its cumulative loss close to the best in a ﬁnite set of experts. [sent-302, score-1.381]

88 At the beginning of each segment, the algorithm selects an expert randomly, according to an exponentially weighted average distribution. [sent-310, score-0.188]

89 Recall that each expert E ∈ E recommends an action fE,t ∈ {0, 1} at every time instance t = 1, . [sent-316, score-0.253]

90 Recall that Ln denotes the cumulative loss of the algorithm while Li,n is that of expert i. [sent-334, score-0.344]

91 , xn ∈ (0, 1] of items, the cumulative loss Ln of the randomized strategy deﬁned in Figure 4 satisﬁes for all i = 1, . [sent-339, score-0.2]

92 ,N δ ln(N/δ) 2 8m ln N/n, Proof We introduce an auxiliary quantity, the so-called hypothetical loss, deﬁned as the loss the algorithm would suffer if it had been in the same state as the selected expert. [sent-350, score-0.235]

93 More precisely, the true loss of the algorithm at time instance t is ℓ(It , xt | st ) and its hypothetic loss is ℓ(It , xt | sJt ,t ). [sent-352, score-0.81]

94 Introducing the notation ℓi,t = ℓ( fi,t , xt | si,t ) , the hypothetical loss of the algorithm is just ℓ(It , xt | sJt ,t ) = ℓ( fJt ,t , xt | sJt ,t ) = ℓJt ,t . [sent-353, score-0.771]

95 2 in Cesa-Bianchi and Lugosi, 2006) that the hypothetical loss of the sequential on-line bin packing algorithm satisﬁes simultaneously for all i = 1, . [sent-358, score-1.318]

96 Now we may decompose the regret relative to expert i as follows: n Ln − Li,n = Ln − ∑ ℓJt ,t t=1 98 n + ∑ ℓJ ,t − Li,n t t=1 . [sent-362, score-0.24]

97 (c) The size of next item xt ∈ (0, 1] is revealed. [sent-382, score-0.445]

98 (d) The algorithm incurs loss ℓ(It , xt | st−1 ) and each expert i incurs loss ℓ( fi,t , xt | si,t−1 ). [sent-383, score-0.836]

99 Constant-threshold Experts In this section we address the sequential on-line bin packing problem when the goal is to perform almost as well as the best in the class of all constant-threshold strategies. [sent-393, score-1.21]

100 Recall that a constantthreshold strategy is parameterized by a number p ∈ (0, 1] and it opens a new bin if and only if the remaining empty space in the bin is less than p. [sent-394, score-1.034]

similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('packing', 0.54), ('bin', 0.494), ('st', 0.237), ('item', 0.229), ('xt', 0.216), ('items', 0.175), ('expert', 0.173), ('sequential', 0.161), ('forecaster', 0.13), ('ottucs', 0.116), ('experts', 0.113), ('orgy', 0.101), ('ugosi', 0.101), ('cumulative', 0.093), ('equential', 0.091), ('bins', 0.084), ('action', 0.08), ('advice', 0.078), ('maker', 0.072), ('regret', 0.067), ('loss', 0.063), ('ln', 0.062), ('jt', 0.06), ('rgy', 0.058), ('round', 0.052), ('merhav', 0.051), ('shipped', 0.051), ('sjt', 0.051), ('state', 0.05), ('decisions', 0.048), ('actions', 0.048), ('assigned', 0.047), ('empty', 0.046), ('incurs', 0.045), ('hypothetical', 0.045), ('lugosi', 0.044), ('sequence', 0.044), ('open', 0.044), ('segment', 0.041), ('decision', 0.04), ('reward', 0.04), ('ak', 0.04), ('vertex', 0.039), ('reference', 0.037), ('arrival', 0.034), ('fjt', 0.034), ('opened', 0.034), ('randomize', 0.034), ('seiden', 0.034), ('xsm', 0.034), ('gy', 0.034), ('budapest', 0.032), ('vi', 0.029), ('sit', 0.029), ('wasted', 0.029), ('stk', 0.029), ('rejection', 0.029), ('lost', 0.029), ('sensor', 0.029), ('randomized', 0.027), ('sk', 0.026), ('ism', 0.026), ('jittered', 0.026), ('hungary', 0.026), ('hannan', 0.026), ('xm', 0.026), ('capacity', 0.025), ('storage', 0.024), ('gmail', 0.024), ('reachable', 0.024), ('adversarial', 0.024), ('revealed', 0.023), ('im', 0.023), ('bor', 0.022), ('sm', 0.022), ('andr', 0.022), ('hindsight', 0.022), ('stored', 0.021), ('mdp', 0.021), ('vertices', 0.019), ('path', 0.019), ('lemma', 0.019), ('states', 0.019), ('sa', 0.018), ('prediction', 0.018), ('packages', 0.017), ('decide', 0.017), ('strategies', 0.017), ('xn', 0.017), ('accounts', 0.017), ('received', 0.016), ('free', 0.016), ('economics', 0.016), ('yu', 0.016), ('segments', 0.016), ('reject', 0.016), ('goal', 0.015), ('algorithm', 0.015), ('tk', 0.015), ('vk', 0.015)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 1.000001 80 jmlr-2010-On-Line Sequential Bin Packing

Author: András György, Gábor Lugosi, György Ottucsàk

2 0.12009002 76 jmlr-2010-Message-passing for Graph-structured Linear Programs: Proximal Methods and Rounding Schemes

Author: Pradeep Ravikumar, Alekh Agarwal, Martin J. Wainwright

Abstract: The problem of computing a maximum a posteriori (MAP) conﬁguration is a central computational challenge associated with Markov random ﬁelds. There has been some focus on “tree-based” linear programming (LP) relaxations for the MAP problem. This paper develops a family of super-linearly convergent algorithms for solving these LPs, based on proximal minimization schemes using Bregman divergences. As with standard message-passing on graphs, the algorithms are distributed and exploit the underlying graphical structure, and so scale well to large problems. Our algorithms have a double-loop character, with the outer loop corresponding to the proximal sequence, and an inner loop of cyclic Bregman projections used to compute each proximal update. We establish convergence guarantees for our algorithms, and illustrate their performance via some simulations. We also develop two classes of rounding schemes, deterministic and randomized, for obtaining integral conﬁgurations from the LP solutions. Our deterministic rounding schemes use a “re-parameterization” property of our algorithms so that when the LP solution is integral, the MAP solution can be obtained even before the LP-solver converges to the optimum. We also propose graph-structured randomized rounding schemes applicable to iterative LP-solving algorithms in general. We analyze the performance of and report simulations comparing these rounding schemes. Keywords: graphical models, MAP Estimation, LP relaxation, proximal minimization, rounding schemes

3 0.11218743 79 jmlr-2010-Near-optimal Regret Bounds for Reinforcement Learning

Author: Thomas Jaksch, Ronald Ortner, Peter Auer

Abstract: For undiscounted reinforcement learning in Markov decision processes (MDPs) we consider the total regret of a learning algorithm with respect to an optimal policy. In order to describe the transition structure of an MDP we propose a new parameter: An MDP has diameter D if for any pair of states s, s′ there is a policy which moves from s to s′ in at most D steps (on average). √ ˜ We present a reinforcement learning algorithm with total regret O(DS AT ) after T steps for any unknown MDP with S states, A actions per state, and diameter D. A corresponding lower bound of √ Ω( DSAT ) on the total regret of any learning algorithm is given as well. These results are complemented by a sample complexity bound on the number of suboptimal steps taken by our algorithm. This bound can be used to achieve a (gap-dependent) regret bound that is logarithmic in T . Finally, we also consider a setting where the MDP is allowed to change a ﬁxed number of ℓ times. We present a modiﬁcation of our algorithm that is able to deal with this setting and show a √ ˜ regret bound of O(ℓ1/3 T 2/3 DS A). Keywords: undiscounted reinforcement learning, Markov decision process, regret, online learning, sample complexity

4 0.098807044 10 jmlr-2010-An Exponential Model for Infinite Rankings

Author: Marina Meilă, Le Bao

Abstract: This paper presents a statistical model for expressing preferences through rankings, when the number of alternatives (items to rank) is large. A human ranker will then typically rank only the most preferred items, and may not even examine the whole set of items, or know how many they are. Similarly, a user presented with the ranked output of a search engine, will only consider the highest ranked items. We model such situations by introducing a stagewise ranking model that operates with ﬁnite ordered lists called top-t orderings over an inﬁnite space of items. We give algorithms to estimate this model from data, and demonstrate that it has sufﬁcient statistics, being thus an exponential family model with continuous and discrete parameters. We describe its conjugate prior and other statistical properties. Then, we extend the estimation problem to multimodal data by introducing an Exponential-Blurring-Mean-Shift nonparametric clustering algorithm. The experiments highlight the properties of our model and demonstrate that inﬁnite models over permutations can be simple, elegant and practical. Keywords: permutations, partial orderings, Mallows model, distance based ranking model, exponential family, non-parametric clustering, branch-and-bound

5 0.083458818 97 jmlr-2010-Regret Bounds and Minimax Policies under Partial Monitoring

Author: Jean-Yves Audibert, Sébastien Bubeck

Abstract: This work deals with four classical prediction settings, namely full information, bandit, label efﬁcient and bandit label efﬁcient as well as four different notions of regret: pseudo-regret, expected regret, high probability regret and tracking the best expert regret. We introduce a new forecaster, INF (Implicitly Normalized Forecaster) based on an arbitrary function ψ for which we propose a uniﬁed γ analysis of its pseudo-regret in the four games we consider. In particular, for ψ(x) = exp(ηx) + K , INF reduces to the classical exponentially weighted average forecaster and our analysis of the pseudo-regret recovers known results while for the expected regret we slightly tighten the bounds. γ η q On the other hand with ψ(x) = −x + K , which deﬁnes a new forecaster, we are able to remove the extraneous logarithmic factor in the pseudo-regret bounds for bandits games, and thus ﬁll in a long open gap in the characterization of the minimax rate for the pseudo-regret in the bandit game. We also provide high probability bounds depending on the cumulative reward of the optimal action. Finally, we consider the stochastic bandit game, and prove that an appropriate modiﬁcation of the upper conﬁdence bound policy UCB1 (Auer et al., 2002a) achieves the distribution-free optimal rate while still having a distribution-dependent rate logarithmic in the number of plays. Keywords: Bandits (adversarial and stochastic), regret bound, minimax rate, label efﬁcient, upper conﬁdence bound (UCB) policy, online learning, prediction with limited feedback.

6 0.072751984 85 jmlr-2010-On the Foundations of Noise-free Selective Classification

7 0.055826019 7 jmlr-2010-A Streaming Parallel Decision Tree Algorithm

8 0.045388158 98 jmlr-2010-Regularized Discriminant Analysis, Ridge Regression and Beyond

9 0.044201486 31 jmlr-2010-Dual Averaging Methods for Regularized Stochastic Learning and Online Optimization

10 0.039881237 52 jmlr-2010-Incremental Sigmoid Belief Networks for Grammar Learning

11 0.037354745 66 jmlr-2010-Linear Algorithms for Online Multitask Classification

12 0.036718342 18 jmlr-2010-Bundle Methods for Regularized Risk Minimization

13 0.036571819 89 jmlr-2010-PAC-Bayesian Analysis of Co-clustering and Beyond

14 0.035059836 4 jmlr-2010-A Generalized Path Integral Control Approach to Reinforcement Learning

15 0.033917412 63 jmlr-2010-Learning Instance-Specific Predictive Models

16 0.033381041 90 jmlr-2010-Permutation Tests for Studying Classifier Performance

17 0.032909863 25 jmlr-2010-Composite Binary Losses

18 0.032070216 51 jmlr-2010-Importance Sampling for Continuous Time Bayesian Networks

19 0.031376071 21 jmlr-2010-Classification Methods with Reject Option Based on Convex Risk Minimization

20 0.028133607 81 jmlr-2010-On Finding Predictors for Arbitrary Families of Processes

similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, -0.134), (1, -0.067), (2, 0.051), (3, -0.089), (4, 0.063), (5, -0.028), (6, -0.042), (7, 0.129), (8, 0.051), (9, 0.228), (10, 0.237), (11, 0.121), (12, 0.052), (13, -0.077), (14, -0.171), (15, 0.025), (16, 0.038), (17, 0.135), (18, 0.058), (19, 0.059), (20, -0.164), (21, 0.206), (22, 0.108), (23, 0.136), (24, 0.102), (25, 0.02), (26, 0.025), (27, -0.031), (28, 0.054), (29, 0.083), (30, 0.057), (31, -0.156), (32, 0.082), (33, -0.017), (34, -0.121), (35, 0.011), (36, 0.184), (37, 0.077), (38, 0.011), (39, 0.068), (40, -0.074), (41, -0.09), (42, -0.077), (43, -0.034), (44, 0.126), (45, 0.05), (46, 0.052), (47, -0.105), (48, 0.086), (49, -0.129)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.98218369 80 jmlr-2010-On-Line Sequential Bin Packing

Author: András György, Gábor Lugosi, György Ottucsàk

2 0.47980991 76 jmlr-2010-Message-passing for Graph-structured Linear Programs: Proximal Methods and Rounding Schemes

Author: Pradeep Ravikumar, Alekh Agarwal, Martin J. Wainwright

3 0.45194262 10 jmlr-2010-An Exponential Model for Infinite Rankings

Author: Marina Meilă, Le Bao

4 0.45127124 79 jmlr-2010-Near-optimal Regret Bounds for Reinforcement Learning

Author: Thomas Jaksch, Ronald Ortner, Peter Auer

5 0.35923058 85 jmlr-2010-On the Foundations of Noise-free Selective Classification

Author: Ran El-Yaniv, Yair Wiener

Abstract: We consider selective classiﬁcation, a term we adopt here to refer to ‘classiﬁcation with a reject option.’ The essence in selective classiﬁcation is to trade-off classiﬁer coverage for higher accuracy. We term this trade-off the risk-coverage (RC) trade-off. Our main objective is to characterize this trade-off and to construct algorithms that can optimally or near optimally achieve the best possible trade-offs in a controlled manner. For noise-free models we present in this paper a thorough analysis of selective classiﬁcation including characterizations of RC trade-offs in various interesting settings. Keywords: classiﬁcation with a reject option, selective classiﬁcation, perfect learning, high performance classiﬁcation, risk-coverage trade-off

6 0.33008748 98 jmlr-2010-Regularized Discriminant Analysis, Ridge Regression and Beyond

7 0.25656724 97 jmlr-2010-Regret Bounds and Minimax Policies under Partial Monitoring

8 0.23063272 66 jmlr-2010-Linear Algorithms for Online Multitask Classification

9 0.20612825 7 jmlr-2010-A Streaming Parallel Decision Tree Algorithm

10 0.20346655 4 jmlr-2010-A Generalized Path Integral Control Approach to Reinforcement Learning

11 0.19673848 61 jmlr-2010-Learning From Crowds

12 0.17780641 63 jmlr-2010-Learning Instance-Specific Predictive Models

13 0.17444988 21 jmlr-2010-Classification Methods with Reject Option Based on Convex Risk Minimization

14 0.16824178 25 jmlr-2010-Composite Binary Losses

15 0.1667866 52 jmlr-2010-Incremental Sigmoid Belief Networks for Grammar Learning

16 0.16467912 34 jmlr-2010-Erratum: SGDQN is Less Careful than Expected

17 0.1571082 31 jmlr-2010-Dual Averaging Methods for Regularized Stochastic Learning and Online Optimization

18 0.14556481 77 jmlr-2010-Model-based Boosting 2.0

19 0.14109156 19 jmlr-2010-Characterization, Stability and Convergence of Hierarchical Clustering Methods

20 0.13173638 90 jmlr-2010-Permutation Tests for Studying Classifier Performance

similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(3, 0.015), (8, 0.016), (15, 0.013), (21, 0.017), (32, 0.074), (36, 0.033), (37, 0.03), (38, 0.031), (75, 0.125), (81, 0.011), (82, 0.452), (85, 0.065)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.69030041 80 jmlr-2010-On-Line Sequential Bin Packing

Author: András György, Gábor Lugosi, György Ottucsàk

2 0.54601425 5 jmlr-2010-A Quasi-Newton Approach to Nonsmooth Convex Optimization Problems in Machine Learning

Author: Jin Yu, S.V.N. Vishwanathan, Simon Güunter, Nicol N. Schraudolph

Abstract: We extend the well-known BFGS quasi-Newton method and its memory-limited variant LBFGS to the optimization of nonsmooth convex objectives. This is done in a rigorous fashion by generalizing three components of BFGS to subdifferentials: the local quadratic model, the identiﬁcation of a descent direction, and the Wolfe line search conditions. We prove that under some technical conditions, the resulting subBFGS algorithm is globally convergent in objective function value. We apply its memory-limited variant (subLBFGS) to L2 -regularized risk minimization with the binary hinge loss. To extend our algorithm to the multiclass and multilabel settings, we develop a new, efﬁcient, exact line search algorithm. We prove its worst-case time complexity bounds, and show that our line search can also be used to extend a recently developed bundle method to the multiclass and multilabel settings. We also apply the direction-ﬁnding component of our algorithm to L1 -regularized risk minimization with logistic loss. In all these contexts our methods perform comparable to or better than specialized state-of-the-art solvers on a number of publicly available data sets. An open source implementation of our algorithms is freely available. Keywords: BFGS, variable metric methods, Wolfe conditions, subgradient, risk minimization, hinge loss, multiclass, multilabel, bundle methods, BMRM, OCAS, OWL-QN

3 0.33391711 114 jmlr-2010-Unsupervised Supervised Learning I: Estimating Classification and Regression Errors without Labels

Author: Pinar Donmez, Guy Lebanon, Krishnakumar Balasubramanian

Abstract: Estimating the error rates of classiﬁers or regression models is a fundamental task in machine learning which has thus far been studied exclusively using supervised learning techniques. We propose a novel unsupervised framework for estimating these error rates using only unlabeled data and mild assumptions. We prove consistency results for the framework and demonstrate its practical applicability on both synthetic and real world data. Keywords: classiﬁcation and regression, maximum likelihood, latent variable models

4 0.33278048 74 jmlr-2010-Maximum Relative Margin and Data-Dependent Regularization

Author: Pannagadatta K. Shivaswamy, Tony Jebara

Abstract: Leading classiﬁcation methods such as support vector machines (SVMs) and their counterparts achieve strong generalization performance by maximizing the margin of separation between data classes. While the maximum margin approach has achieved promising performance, this article identiﬁes its sensitivity to afﬁne transformations of the data and to directions with large data spread. Maximum margin solutions may be misled by the spread of data and preferentially separate classes along large spread directions. This article corrects these weaknesses by measuring margin not in the absolute sense but rather only relative to the spread of data in any projection direction. Maximum relative margin corresponds to a data-dependent regularization on the classiﬁcation function while maximum absolute margin corresponds to an ℓ2 norm constraint on the classiﬁcation function. Interestingly, the proposed improvements only require simple extensions to existing maximum margin formulations and preserve the computational efﬁciency of SVMs. Through the maximization of relative margin, surprising performance gains are achieved on real-world problems such as digit, text classiﬁcation and on several other benchmark data sets. In addition, risk bounds are derived for the new formulation based on Rademacher averages. Keywords: support vector machines, kernel methods, large margin, Rademacher complexity

5 0.3295114 89 jmlr-2010-PAC-Bayesian Analysis of Co-clustering and Beyond

Author: Yevgeny Seldin, Naftali Tishby

Abstract: We derive PAC-Bayesian generalization bounds for supervised and unsupervised learning models based on clustering, such as co-clustering, matrix tri-factorization, graphical models, graph clustering, and pairwise clustering.1 We begin with the analysis of co-clustering, which is a widely used approach to the analysis of data matrices. We distinguish among two tasks in matrix data analysis: discriminative prediction of the missing entries in data matrices and estimation of the joint probability distribution of row and column variables in co-occurrence matrices. We derive PAC-Bayesian generalization bounds for the expected out-of-sample performance of co-clustering-based solutions for these two tasks. The analysis yields regularization terms that were absent in the previous formulations of co-clustering. The bounds suggest that the expected performance of co-clustering is governed by a trade-off between its empirical performance and the mutual information preserved by the cluster variables on row and column IDs. We derive an iterative projection algorithm for ﬁnding a local optimum of this trade-off for discriminative prediction tasks. This algorithm achieved stateof-the-art performance in the MovieLens collaborative ﬁltering task. Our co-clustering model can also be seen as matrix tri-factorization and the results provide generalization bounds, regularization terms, and new algorithms for this form of matrix factorization. The analysis of co-clustering is extended to tree-shaped graphical models, which can be used to analyze high dimensional tensors. According to the bounds, the generalization abilities of treeshaped graphical models depend on a trade-off between their empirical data ﬁt and the mutual information that is propagated up the tree levels. We also formulate weighted graph clustering as a prediction problem: given a subset of edge weights we analyze the ability of graph clustering to predict the remaining edge weights. The analysis of co-clustering easily

6 0.32839078 92 jmlr-2010-Practical Approaches to Principal Component Analysis in the Presence of Missing Values

7 0.32695723 103 jmlr-2010-Sparse Semi-supervised Learning Using Conjugate Functions

8 0.32553986 102 jmlr-2010-Semi-Supervised Novelty Detection

9 0.32484284 9 jmlr-2010-An Efficient Explanation of Individual Classifications using Game Theory

10 0.32465559 63 jmlr-2010-Learning Instance-Specific Predictive Models

11 0.32461679 59 jmlr-2010-Large Scale Online Learning of Image Similarity Through Ranking

12 0.32406896 49 jmlr-2010-Hubs in Space: Popular Nearest Neighbors in High-Dimensional Data

13 0.32385913 111 jmlr-2010-Topology Selection in Graphical Models of Autoregressive Processes

14 0.3232826 105 jmlr-2010-Spectral Regularization Algorithms for Learning Large Incomplete Matrices

15 0.32284081 46 jmlr-2010-High Dimensional Inverse Covariance Matrix Estimation via Linear Programming

16 0.32279447 87 jmlr-2010-Online Learning for Matrix Factorization and Sparse Coding

17 0.32235593 43 jmlr-2010-Generalized Power Method for Sparse Principal Component Analysis

18 0.32229704 109 jmlr-2010-Stochastic Composite Likelihood

19 0.32214329 66 jmlr-2010-Linear Algorithms for Online Multitask Classification

20 0.32203883 95 jmlr-2010-Rademacher Complexities and Bounding the Excess Risk in Active Learning