nips nips2013 nips2013-230 knowledge-graph by maker-knowledge-mining

230 nips-2013-Online Learning with Costly Features and Labels


Source: pdf

Author: Navid Zolghadr, Gabor Bartok, Russell Greiner, András György, Csaba Szepesvari

Abstract: This paper introduces the online probing problem: In each round, the learner is able to purchase the values of a subset of feature values. After the learner uses this information to come up with a prediction for the given round, he then has the option of paying to see the loss function that he is evaluated against. Either way, the learner pays for both the errors of his predictions and also whatever he chooses to observe, including the cost of observing the loss function for the given round and the cost of the observed features. We consider two variations of this problem, depending on whether the learner can observe the label for free or not. We provide algorithms and upper and lower bounds on the regret for both variants. We show that a positive cost for observing the label significantly increases the regret of the problem. 1

Reference: text


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 ca Abstract This paper introduces the online probing problem: In each round, the learner is able to purchase the values of a subset of feature values. [sent-5, score-0.832]

2 After the learner uses this information to come up with a prediction for the given round, he then has the option of paying to see the loss function that he is evaluated against. [sent-6, score-0.631]

3 Either way, the learner pays for both the errors of his predictions and also whatever he chooses to observe, including the cost of observing the loss function for the given round and the cost of the observed features. [sent-7, score-0.869]

4 We consider two variations of this problem, depending on whether the learner can observe the label for free or not. [sent-8, score-0.434]

5 We provide algorithms and upper and lower bounds on the regret for both variants. [sent-9, score-0.31]

6 We show that a positive cost for observing the label significantly increases the regret of the problem. [sent-10, score-0.578]

7 1 Introduction In this paper, we study a variant of online learning, called online probing, which is motivated by practical problems where there is a cost to observing the features that may help one’s predictions. [sent-11, score-0.552]

8 Online probing is a class of online learning problems. [sent-12, score-0.371]

9 In each time step t, the learner produces his prediction based on the values of some feature xt = (xt,1 , . [sent-14, score-0.624]

10 1 However, unlike in the standard online learning settings, if the learner wants to use the value of feature i to produce a prediction, he has to purchase the value at some fixed, a priori known cost, ci 0. [sent-18, score-0.628]

11 Features whose value is not purchased in a given round remain unobserved by the learner. [sent-19, score-0.241]

12 Once a prediction yt 2 Y ˆ is produced, it is evaluated against a loss function `t : Y ! [sent-20, score-0.359]

13 At the end of a round, the learner has the option of purchasing the full loss function, again at a fixed prespecified cost cd+1 0 (by default, the loss function is not revealed to the learner). [sent-22, score-0.691]

14 The learner’s performance is measured by his regret as he competes against some prespecified set of predictors. [sent-23, score-0.31]

15 Just like the learner, a competing predictor also needs to purchase the feature values needed in the prediction. [sent-24, score-0.26]

16 Note that when defining the best competitor in hindsight, we did not include the cost of observing the loss function. [sent-54, score-0.323]

17 Thus, we prefer the current regret definition as it promotes the study of regret when there is a price attached to observing the loss functions. [sent-56, score-0.851]

18 Several works in the literature show that delays usually increase the regret in a moderate fashion (Mesterharm, 2005; Weinberger and Ordentlich, 2006; Agarwal and Duchi, 2011; Joulani et al. [sent-65, score-0.343]

19 Finally, although we pose the task as an online learning problem, it is easy to show that the procedures we develop can also be used to attack the batch learning problem, when the goal is to learn a predictor that will be cost-efficient on future data given a database of examples. [sent-79, score-0.31]

20 Obviously, when observing the loss is costly, the problem is related to active learning. [sent-80, score-0.264]

21 However, to our best knowledge, the case when observing the features is costly has not been studied before in the online learning literature. [sent-81, score-0.413]

22 In the first version, free-label online probing, there is no cost to seeing the loss function, that is, cd+1 = 0. [sent-85, score-0.309]

23 (The loss function often compares the predicted value with some label in a known way, in which case learning the value of the label for the round means that the whole loss function becomes known; hence the choice of the name. [sent-86, score-0.58]

24 ) Thus, the learner naturally will choose to see the loss function after he provides his prediction; this provides feedback that the learner can use, to improve the predictor he produces. [sent-87, score-0.946]

25 In the second version, non-free-label online probing, the cost of seeing the loss function is positive: cd+1 > 0. [sent-88, score-0.309]

26 We give an algorithm that enjoys a regret p of O( 2d LT ln NT (1/(T L))) when the losses are L-equi-Lipschitz (Theorem 2. [sent-90, score-0.63]

27 This leads to an O( 2d LT ) regret bound for typical function classes, such as the class of linear predictors with bounded weights and bounded inputs. [sent-92, score-0.475]

28 For the specialp case of linear prediction with quadratic loss, we give an ˜ algorithm whose regret scales only as O( dt), a vast improvement in the dependence on d. [sent-94, score-0.424]

29 The case of non-free-label online probing is treated in Section 3. [sent-95, score-0.371]

30 Here, in contrast to the free-label ˜ case, we prove that the minimax growth rate of the regret is of the order ⇥(T 2/3 ). [sent-96, score-0.31]

31 1 Related Work This paper analyzes online learning when features (and perhaps labels) have to be purchased. [sent-105, score-0.275]

32 The standard “batch learning” framework has a pure explore phase, which gives the learner a set of labeled, completely specified examples, followed by a pure exploit phase, where the learned predictor is asked to predict the label for novel instances. [sent-106, score-0.607]

33 Notice the learner is not required (nor even allowed) to decide which information to gather. [sent-107, score-0.344]

34 By contrast, “active (batch) learning” requires the learner to identify that information (Settles, 2009). [sent-108, score-0.344]

35 Our model, however, requires the learner to purchase feature values as well. [sent-110, score-0.461]

36 This is extended in Kapoor and Greiner (2005) to a version that requires the eventual predictor (as well as the learner) to pay to see feature values as well. [sent-113, score-0.254]

37 However, these are still in the batch framework: after gathering the information, the learner produces a predictor, which is not changed afterwards. [sent-114, score-0.417]

38 Our problem is an online problem over multiple rounds, where at each round the learner is required to predict the label for the current example. [sent-115, score-0.766]

39 (2005) provided upper and lower bounds on the regret where the learner is given all the features for each example, but must pay for any labels he requests. [sent-118, score-0.87]

40 In our problem, the learner must pay to see the values of the features of each example as well as the cost to obtain its true label at each round. [sent-119, score-0.682]

41 (2011) provided an algorithm for this task in the online settings, p with optimal O( T ) regret where T is the number of rounds. [sent-127, score-0.442]

42 Our model differs from this model as in our case the learner has the option to obtain the values of only the subset of the features that he selects. [sent-128, score-0.509]

43 2 Free-Label Probing In this section we consider the case when the cost of observing the loss function is zero. [sent-129, score-0.293]

44 Thus, we can assume without loss of generality that the learner receives the loss function at the end of each round (i. [sent-130, score-0.744]

45 We will first consider the general setting where the only restriction is that the losses are equi-Lipschitz and the function set F has a finite empirical worst-case covering number. [sent-133, score-0.243]

46 Then we consider the special case where the set of competitors are the linear predictors and the losses are quadratic. [sent-134, score-0.31]

47 1 The Case of Lipschitz losses In this section we assume that the loss functions, `t , are Lipschitz with a known, common Lipschitz constant L over Y w. [sent-136, score-0.301]

48 (1) y,y 0 2Y 3 Clearly, the problem is an instance of prediction with expert advice under partial information feedback (Auer et al. [sent-140, score-0.23]

49 Note that, if the learner chooses to observe the values of some features, then he will also be able to evaluate the losses of all the predictors f 2 F that use only these selected features. [sent-142, score-0.654]

50 Then, the learner can compute the loss of any predictor f 2 F such that s(f )  st , where  denotes the conjunction of the component-wise comparison. [sent-144, score-0.66]

51 However, for some loss functions, it may be possible to estimate the losses of other predictors, too. [sent-145, score-0.301]

52 The regret will then depend on how the covering numbers of the space F behave. [sent-151, score-0.367]

53 Mannor and Shamir (2011) studied problems similar to this in a general framework, where in addition to the loss of the selected predictor (expert), the losses of some other predictors are also communicated to the learner in every round. [sent-152, score-0.912]

54 It is assumed that the graph of any round t, Gt = (F, Et ) becomes known to the learner at the beginning of the round. [sent-156, score-0.514]

55 The regret of ELP is analyzed in the following theorem. [sent-160, score-0.31]

56 Consider a prediction with expert advice problem over F where in round t, Gt = (F, Et ) is the directed graph that encodes which losses become available to the learner. [sent-163, score-0.553]

57 Let B be a bound on the non-negative losses `t : maxt 1,f 2F `t (f (xt ))  B. [sent-165, score-0.263]

58 Then, there exists a constant CELP > 0 such that for any T > 0, the regret of Algorithm 2 (shown in the Appendix) when competing against the best predictor using ELP satisfies v u T u X E[R ]  C B t(ln |F|) (G ) . [sent-166, score-0.453]

59 (2) T ELP t t=1 The algorithm’s computational cost in any given round is poly(|F|). [sent-167, score-0.232]

60 4 We are going to apply the ELP algorithm to F 0 and apply (3) to obtain a regret bound. [sent-182, score-0.31]

61 If f 0 uses more features than f then the cost-penalized distance between f 0 and f is bounded from below by the cost of observing the extra features. [sent-183, score-0.288]

62 Then, there exists an algorithm such that for any T > 0, knowing T , the regret satisfies q E[RT ]  CELP (C1 + `max ) 2d T ln NT (F, ↵) + T L↵ . [sent-190, score-0.444]

63 Then, there exists an algorithm such that for any T > 0, the regret of the algorithm satisfies, q E [RT ]  CELP (C1 + `max ) d2d T ln(1 + 2T LW X) + 1 . [sent-212, score-0.31]

64 Note that if one is given an a priori bound p on the maximum number of features that can be used in a single round (allowing the algorithm to use fewer than p, but not more features) then 2d in P the above bound could be replaced by 1ip d ⇡ dp , where the approximation assumes that i p < d/2. [sent-213, score-0.397]

65 Such a bound on the number of features available per round may arise from strict budgetary considerations. [sent-214, score-0.321]

66 In the next theorem, however, we show that the worst-case exponential dependence of the regret on the number of features cannot be improved (while keeping the root-T dependence on the horizon). [sent-218, score-0.42]

67 There exist an instance of free-label online probing such that the minimax regret of ⇣q ⌘ any algorithm is ⌦ d d/2 T . [sent-222, score-0.681]

68 2 Linear Prediction with Quadratic Losses In this section, we study the problem under the assumption that the predictors have a linear form and the loss functions are quadratic. [sent-224, score-0.239]

69 That is, F ⇢ Lin(X , W) where W = {w 2 Rd | kwk⇤  wlim } and X = {x 2 Rd | kxk  xlim } for some given constants wlim , xlim > 0, while `t (y) = (y yt )2 , where |yt |  xlim wlim . [sent-225, score-1.415]

70 Thus, choosing a predictor is akin to selecting a weight vector wt 2 W, as well as a binary vector st 2 G ⇢ {0, 1}d that encodes the features to be used in round t. [sent-226, score-0.629]

71 The prediction for round t is then yt = h wt , st xt i, where denotes coordinate-wise product, while ˆ the loss suffered is (ˆt yt )2 . [sent-227, score-1.055]

72 p ˜ In this section we show that in this case a regret bound of size O( poly(d)T ) is possible. [sent-229, score-0.351]

73 That the construction of such unbiased estimates is possible, despite that some feature values are unobserved, is because of the special algebraic structure of the prediction and loss functions. [sent-231, score-0.263]

74 Expanding the loss of the prediction yt = h w, xt i, we get that the loss of using w 2 W is ˆ . [sent-238, score-0.636]

75 2 `t (w) = `t (h w, xt i) = w> Xt w 2 w> xt yt + yt , where with a slight abuse of notation we have introduced the loss function `t : W ! [sent-239, score-0.761]

76 For 1  i  d, let X qt (i) = pt (w) , w2W 0 :i2s(w) be the probability that s(Wt ) will contain i, while for 1  i, j  d, let X qt (i, j) = pt (w) , w2W 0 :i,j2s(w) be the probability that both i, j 2 s(Wt ). [sent-246, score-0.354]

77 (5) qt (i) qt (i, j) h i ˜ It can be readily verified that E [˜t | pt ] = xt and E Xt | pt = Xt . [sent-250, score-0.516]

78 Further, notice that both xt x ˜ ˜ t can be computed based on the information available at the end of round t, i. [sent-251, score-0.332]

79 Now, define the estimate of prediction loss ˜ ˜ `t (w) = w> Xt w 2 2 w > x t y t + yt . [sent-254, score-0.359]

80 ˜ (6) Note that yt can be readily computed from `t (·), which is available to the algorithm (equivalently, we may assume that the algorithm observed yt ). [sent-255, score-0.322]

81 Hence, by adding a feature cost term we get `t (w) + h s(w), c i as an estimate of the loss that the learner would have suffered at round t had he chosen the weight vector w. [sent-258, score-0.795]

82 In the name of the algorithm, LQ stands for linear prediction with quadratic losses and D denotes discretization. [sent-273, score-0.3]

83 Using the notation ylim = wlim xlim and EG = maxs2G supw2W:kwk⇤ =1 kw sk⇤ , we can state the following regret bound on the algorithm Theorem 2. [sent-275, score-0.818]

84 Let wlim , xlim > 0, c 2 [0, 1)d be given, W ⇢ Bk·k⇤ (0, wlim ) convex, X ⇢ Bk·k (0, xlim ) and fix T 1. [sent-277, score-0.836]

85 Then, q 2 2 2 E [RT ]  C T d (4ylim + kck1 )(wlim x2 + 2ylim wlim xlim + 4ylim + kck1 ) ln(EG T ) , lim where C > 0 is a universal constant (i. [sent-279, score-0.418]

86 The resulting algorithm enjoys essentially the same regret bound, and can be implemented efficiently whenever efficient sampling is possible from the resulting distribution. [sent-285, score-0.31]

87 Let d > 0, and consider the online free label probing problem with linear predictors, where W = {w 2 Rd | kwk1  wlim } and X = {x 2 Rd | kxk1  1}. [sent-291, score-0.682]

88 Assume, for all t 1, that the loss functions are of the form `t (w) = (w> xt yt )2 + h s(w), c i, where |yt |  1 and 4d c = 1/2 ⇥ 1 2 Rd . [sent-292, score-0.438]

89 Then, for any prediction algorithm and for any T 8 ln(4/3) , there exists a 7 sequence ((xt , yt ))1tT 2 (X ⇥ [ 1, 1])T such that the regret of the algorithm can be bounded from below as p 2 1 p E[RT ] p Td. [sent-293, score-0.554]

90 32 ln(4/3) 3 Non-Free-Label Probing If cd+1 > 0, the learner has to pay for observing the true label. [sent-294, score-0.536]

91 This scenario is very similar to the well-known label-efficient prediction case in online learning (Cesa-Bianchi et al. [sent-295, score-0.248]

92 In fact, the latter problem is a special case of this problem, immediately giving us that the regret of any algorithm is at least of order T 2/3 . [sent-297, score-0.31]

93 It turns out that if one observes the (costly) label in a given round then it does not effect the regret rate if one observes all the features at the same time. [sent-298, score-0.68]

94 The resulting “revealing action algorithm”, given in Algorithm 3 in the Appendix, achieves the following regret bound for finite expert classes: Lemma 3. [sent-299, score-0.434]

95 Given any non-free-label online probing with finitely many experts, Algorithm 3 with appropriately set parameters achieves ⇣ ⌘ p E[RT ]  C max T 2/3 (`2 kck1 ln |F|)1/3 , `max T ln |F| max for some constant C > 0. [sent-301, score-0.703]

96 There exists a constant C such that, for any non-free-label probing with linear prePd dictors, quadratic loss, and cj > (1/d) i=1 ci 1/2d for every j = 1, . [sent-307, score-0.27]

97 , d, the expected regret of any algorithm can be lower bounded by E[RT ] C(cd+1 d)1/3 T 2/3 . [sent-310, score-0.31]

98 In this problem, the learner has the option of choosing the subset of features he wants to observe as well as the option of observing the true label, but has to pay for this information. [sent-312, score-0.756]

99 We p showed that when the labels are free, it is possible to devise algorithms with optimal regret rate ⇥( T ) (up to logarithmic factors), while in the non-free-label case we showed that only ⇥(T 2/3 ) is achievable. [sent-314, score-0.34]

100 We gave algorithms that achieve the optimal regret rate (up to logarithmic factors) when the number of experts is finite or in the case of linear prediction. [sent-315, score-0.371]


similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('learner', 0.344), ('regret', 0.31), ('probing', 0.239), ('wlim', 0.221), ('xlim', 0.197), ('losses', 0.186), ('lqde', 0.172), ('round', 0.17), ('xt', 0.162), ('yt', 0.161), ('celp', 0.147), ('predictor', 0.143), ('ln', 0.134), ('online', 0.132), ('predictors', 0.124), ('observing', 0.116), ('loss', 0.115), ('wt', 0.112), ('features', 0.11), ('rt', 0.109), ('elp', 0.108), ('shamir', 0.098), ('cd', 0.091), ('label', 0.09), ('qt', 0.089), ('pt', 0.088), ('xp', 0.087), ('greiner', 0.087), ('lw', 0.087), ('mannor', 0.083), ('expert', 0.083), ('bk', 0.083), ('prediction', 0.083), ('purchase', 0.082), ('nt', 0.081), ('pay', 0.076), ('gt', 0.075), ('purchased', 0.071), ('dy', 0.071), ('patient', 0.068), ('rd', 0.066), ('cost', 0.062), ('experts', 0.061), ('rostamizadeh', 0.06), ('delayed', 0.06), ('st', 0.058), ('covering', 0.057), ('discretization', 0.055), ('tests', 0.055), ('costly', 0.055), ('option', 0.055), ('diagnostic', 0.051), ('alberta', 0.051), ('costeffective', 0.049), ('joulani', 0.049), ('mesterharm', 0.049), ('ylim', 0.049), ('zolghadr', 0.049), ('xd', 0.049), ('hindsight', 0.046), ('lin', 0.044), ('horizon', 0.043), ('ut', 0.042), ('bound', 0.041), ('lipschitz', 0.04), ('lizotte', 0.04), ('kwk', 0.04), ('lugosi', 0.039), ('kapoor', 0.038), ('prespeci', 0.038), ('gathering', 0.038), ('horizons', 0.038), ('ordentlich', 0.038), ('lt', 0.037), ('weight', 0.036), ('rgy', 0.036), ('maxt', 0.036), ('feature', 0.035), ('batch', 0.035), ('priori', 0.035), ('weinberger', 0.034), ('paying', 0.034), ('stoltz', 0.034), ('et', 0.033), ('analyzes', 0.033), ('suffered', 0.033), ('active', 0.033), ('settles', 0.032), ('max', 0.032), ('quadratic', 0.031), ('poly', 0.031), ('advice', 0.031), ('eg', 0.031), ('bart', 0.031), ('unbiased', 0.03), ('competitor', 0.03), ('gy', 0.03), ('predict', 0.03), ('labels', 0.03), ('agarwal', 0.029)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 1.0000004 230 nips-2013-Online Learning with Costly Features and Labels

Author: Navid Zolghadr, Gabor Bartok, Russell Greiner, András György, Csaba Szepesvari

Abstract: This paper introduces the online probing problem: In each round, the learner is able to purchase the values of a subset of feature values. After the learner uses this information to come up with a prediction for the given round, he then has the option of paying to see the loss function that he is evaluated against. Either way, the learner pays for both the errors of his predictions and also whatever he chooses to observe, including the cost of observing the loss function for the given round and the cost of the observed features. We consider two variations of this problem, depending on whether the learner can observe the label for free or not. We provide algorithms and upper and lower bounds on the regret for both variants. We show that a positive cost for observing the label significantly increases the regret of the problem. 1

2 0.29054117 325 nips-2013-The Pareto Regret Frontier

Author: Wouter M. Koolen

Abstract: Performance guarantees for online learning algorithms typically take the form of regret bounds, which express that the cumulative loss overhead compared to the best expert in hindsight is small. In the common case of large but structured expert sets we typically wish to keep the regret especially small compared to simple experts, at the cost of modest additional overhead compared to more complex others. We study which such regret trade-offs can be achieved, and how. We analyse regret w.r.t. each individual expert as a multi-objective criterion in the simple but fundamental case of absolute loss. We characterise the achievable and Pareto optimal trade-offs, and the corresponding optimal strategies for each sample size both exactly for each finite horizon and asymptotically. 1

3 0.29019037 227 nips-2013-Online Learning in Markov Decision Processes with Adversarially Chosen Transition Probability Distributions

Author: Yasin Abbasi, Peter Bartlett, Varun Kanade, Yevgeny Seldin, Csaba Szepesvari

Abstract: We study the problem of online learning Markov Decision Processes (MDPs) when both the transition distributions and loss functions are chosen by an adversary. We present an algorithm that, under a mixing assumption, achieves p O( T log |⇧| + log |⇧|) regret with respect to a comparison set of policies ⇧. The regret is independent of the size of the state and action spaces. When expectations over sample paths can be computed efficiently and the comparison set ⇧ has polynomial size, this algorithm is efficient. We also consider the episodic adversarial online shortest path problem. Here, in each episode an adversary may choose a weighted directed acyclic graph with an identified start and finish node. The goal of the learning algorithm is to choose a path that minimizes the loss while traversing from the start to finish node. At the end of each episode the loss function (given by weights on the edges) is revealed to the learning algorithm. The goal is to minimize regret with respect to a fixed policy for selecting paths. This problem is a special case of the online MDP problem. It was shown that for randomly chosen graphs and adversarial losses, the problem can be efficiently solved. We show that it also can be efficiently solved for adversarial graphs and randomly chosen losses. When both graphs and losses are adversarially chosen, we show that designing efficient algorithms for the adversarial online shortest path problem (and hence for the adversarial MDP problem) is as hard as learning parity with noise, a notoriously difficult problem that has been used to design efficient cryptographic schemes. Finally, we present an efficient algorithm whose regret scales linearly with the number of distinct graphs. 1

4 0.24139625 235 nips-2013-Online learning in episodic Markovian decision processes by relative entropy policy search

Author: Alexander Zimin, Gergely Neu

Abstract: We study the problem of online learning in finite episodic Markov decision processes (MDPs) where the loss function is allowed to change between episodes. The natural performance measure in this learning problem is the regret defined as the difference between the total loss of the best stationary policy and the total loss suffered by the learner. We assume that the learner is given access to a finite action space A and the state space X has a layered structure with L layers, so that state transitions are only possible between consecutive layers. We describe a variant of the recently proposed Relative Entropy Policy Search algorithm and show that its regret after T episodes is 2 L|X ||A|T log(|X ||A|/L) in the bandit setting and 2L T log(|X ||A|/L) in the full information setting, given that the learner has perfect knowledge of the transition probabilities of the underlying MDP. These guarantees largely improve previously known results under much milder assumptions and cannot be significantly improved under general assumptions. 1

5 0.23696758 125 nips-2013-From Bandits to Experts: A Tale of Domination and Independence

Author: Noga Alon, Nicolò Cesa-Bianchi, Claudio Gentile, Yishay Mansour

Abstract: We consider the partial observability model for multi-armed bandits, introduced by Mannor and Shamir [14]. Our main result is a characterization of regret in the directed observability model in terms of the dominating and independence numbers of the observability graph (which must be accessible before selecting an action). In the undirected case, we show that the learner can achieve optimal regret without even accessing the observability graph before selecting an action. Both results are shown using variants of the Exp3 algorithm operating on the observability graph in a time-efficient manner. 1

6 0.23325515 89 nips-2013-Dimension-Free Exponentiated Gradient

7 0.19856584 231 nips-2013-Online Learning with Switching Costs and Other Adaptive Adversaries

8 0.19628417 191 nips-2013-Minimax Optimal Algorithms for Unconstrained Linear Optimization

9 0.18811771 2 nips-2013-(Nearly) Optimal Algorithms for Private Online Learning in Full-information and Bandit Settings

10 0.1667693 137 nips-2013-High-Dimensional Gaussian Process Bandits

11 0.16408999 240 nips-2013-Optimization, Learning, and Games with Predictable Sequences

12 0.1555744 311 nips-2013-Stochastic Convex Optimization with Multiple Objectives

13 0.14264834 26 nips-2013-Adaptive Market Making via Online Learning

14 0.13800612 266 nips-2013-Recurrent linear models of simultaneously-recorded neural populations

15 0.13339035 273 nips-2013-Reinforcement Learning in Robust Markov Decision Processes

16 0.12937176 242 nips-2013-PAC-Bayes-Empirical-Bernstein Inequality

17 0.1258965 171 nips-2013-Learning with Noisy Labels

18 0.12165404 253 nips-2013-Prior-free and prior-dependent regret bounds for Thompson Sampling

19 0.11927202 275 nips-2013-Reservoir Boosting : Between Online and Offline Ensemble Learning

20 0.11865278 338 nips-2013-Two-Target Algorithms for Infinite-Armed Bandits with Bernoulli Rewards


similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.275), (1, -0.174), (2, 0.266), (3, -0.255), (4, 0.002), (5, -0.077), (6, -0.004), (7, 0.057), (8, 0.034), (9, -0.056), (10, -0.077), (11, -0.124), (12, 0.044), (13, 0.003), (14, 0.062), (15, -0.155), (16, -0.014), (17, -0.006), (18, 0.051), (19, -0.011), (20, -0.117), (21, -0.02), (22, 0.026), (23, -0.004), (24, 0.006), (25, 0.031), (26, -0.094), (27, -0.002), (28, 0.012), (29, -0.001), (30, 0.008), (31, 0.063), (32, 0.017), (33, 0.028), (34, -0.127), (35, 0.012), (36, -0.042), (37, -0.105), (38, 0.039), (39, -0.03), (40, -0.013), (41, 0.02), (42, 0.008), (43, -0.055), (44, 0.084), (45, 0.031), (46, 0.008), (47, -0.058), (48, -0.012), (49, -0.013)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.96395135 230 nips-2013-Online Learning with Costly Features and Labels

Author: Navid Zolghadr, Gabor Bartok, Russell Greiner, András György, Csaba Szepesvari

Abstract: This paper introduces the online probing problem: In each round, the learner is able to purchase the values of a subset of feature values. After the learner uses this information to come up with a prediction for the given round, he then has the option of paying to see the loss function that he is evaluated against. Either way, the learner pays for both the errors of his predictions and also whatever he chooses to observe, including the cost of observing the loss function for the given round and the cost of the observed features. We consider two variations of this problem, depending on whether the learner can observe the label for free or not. We provide algorithms and upper and lower bounds on the regret for both variants. We show that a positive cost for observing the label significantly increases the regret of the problem. 1

2 0.84396356 325 nips-2013-The Pareto Regret Frontier

Author: Wouter M. Koolen

Abstract: Performance guarantees for online learning algorithms typically take the form of regret bounds, which express that the cumulative loss overhead compared to the best expert in hindsight is small. In the common case of large but structured expert sets we typically wish to keep the regret especially small compared to simple experts, at the cost of modest additional overhead compared to more complex others. We study which such regret trade-offs can be achieved, and how. We analyse regret w.r.t. each individual expert as a multi-objective criterion in the simple but fundamental case of absolute loss. We characterise the achievable and Pareto optimal trade-offs, and the corresponding optimal strategies for each sample size both exactly for each finite horizon and asymptotically. 1

3 0.75359142 231 nips-2013-Online Learning with Switching Costs and Other Adaptive Adversaries

Author: Nicolò Cesa-Bianchi, Ofer Dekel, Ohad Shamir

Abstract: We study the power of different types of adaptive (nonoblivious) adversaries in the setting of prediction with expert advice, under both full-information and bandit feedback. We measure the player’s performance using a new notion of regret, also known as policy regret, which better captures the adversary’s adaptiveness to the player’s behavior. In a setting where losses are allowed to drift, we characterize —in a nearly complete manner— the power of adaptive adversaries with bounded memories and switching costs. In particular, we show that with switch� ing costs, the attainable rate with bandit feedback is Θ(T 2/3 ). Interestingly, this √ rate is significantly worse than the Θ( T ) rate attainable with switching costs in the full-information case. Via a novel reduction from experts to bandits, we also � show that a bounded memory adversary can force Θ(T 2/3 ) regret even in the full information case, proving that switching costs are easier to control than bounded memory adversaries. Our lower bounds rely on a new stochastic adversary strategy that generates loss processes with strong dependencies. 1

4 0.7442596 227 nips-2013-Online Learning in Markov Decision Processes with Adversarially Chosen Transition Probability Distributions

Author: Yasin Abbasi, Peter Bartlett, Varun Kanade, Yevgeny Seldin, Csaba Szepesvari

Abstract: We study the problem of online learning Markov Decision Processes (MDPs) when both the transition distributions and loss functions are chosen by an adversary. We present an algorithm that, under a mixing assumption, achieves p O( T log |⇧| + log |⇧|) regret with respect to a comparison set of policies ⇧. The regret is independent of the size of the state and action spaces. When expectations over sample paths can be computed efficiently and the comparison set ⇧ has polynomial size, this algorithm is efficient. We also consider the episodic adversarial online shortest path problem. Here, in each episode an adversary may choose a weighted directed acyclic graph with an identified start and finish node. The goal of the learning algorithm is to choose a path that minimizes the loss while traversing from the start to finish node. At the end of each episode the loss function (given by weights on the edges) is revealed to the learning algorithm. The goal is to minimize regret with respect to a fixed policy for selecting paths. This problem is a special case of the online MDP problem. It was shown that for randomly chosen graphs and adversarial losses, the problem can be efficiently solved. We show that it also can be efficiently solved for adversarial graphs and randomly chosen losses. When both graphs and losses are adversarially chosen, we show that designing efficient algorithms for the adversarial online shortest path problem (and hence for the adversarial MDP problem) is as hard as learning parity with noise, a notoriously difficult problem that has been used to design efficient cryptographic schemes. Finally, we present an efficient algorithm whose regret scales linearly with the number of distinct graphs. 1

5 0.73472565 235 nips-2013-Online learning in episodic Markovian decision processes by relative entropy policy search

Author: Alexander Zimin, Gergely Neu

Abstract: We study the problem of online learning in finite episodic Markov decision processes (MDPs) where the loss function is allowed to change between episodes. The natural performance measure in this learning problem is the regret defined as the difference between the total loss of the best stationary policy and the total loss suffered by the learner. We assume that the learner is given access to a finite action space A and the state space X has a layered structure with L layers, so that state transitions are only possible between consecutive layers. We describe a variant of the recently proposed Relative Entropy Policy Search algorithm and show that its regret after T episodes is 2 L|X ||A|T log(|X ||A|/L) in the bandit setting and 2L T log(|X ||A|/L) in the full information setting, given that the learner has perfect knowledge of the transition probabilities of the underlying MDP. These guarantees largely improve previously known results under much milder assumptions and cannot be significantly improved under general assumptions. 1

6 0.71353334 125 nips-2013-From Bandits to Experts: A Tale of Domination and Independence

7 0.70412737 89 nips-2013-Dimension-Free Exponentiated Gradient

8 0.67175186 159 nips-2013-Learning Prices for Repeated Auctions with Strategic Buyers

9 0.64452773 26 nips-2013-Adaptive Market Making via Online Learning

10 0.64425439 191 nips-2013-Minimax Optimal Algorithms for Unconstrained Linear Optimization

11 0.58002919 2 nips-2013-(Nearly) Optimal Algorithms for Private Online Learning in Full-information and Bandit Settings

12 0.57186586 240 nips-2013-Optimization, Learning, and Games with Predictable Sequences

13 0.55479896 181 nips-2013-Machine Teaching for Bayesian Learners in the Exponential Family

14 0.55279827 275 nips-2013-Reservoir Boosting : Between Online and Offline Ensemble Learning

15 0.54717189 223 nips-2013-On the Relationship Between Binary Classification, Bipartite Ranking, and Binary Class Probability Estimation

16 0.5470323 270 nips-2013-Regret based Robust Solutions for Uncertain Markov Decision Processes

17 0.54349637 103 nips-2013-Efficient Exploration and Value Function Generalization in Deterministic Systems

18 0.51634085 137 nips-2013-High-Dimensional Gaussian Process Bandits

19 0.50127351 273 nips-2013-Reinforcement Learning in Robust Markov Decision Processes

20 0.49789488 72 nips-2013-Convex Calibrated Surrogates for Low-Rank Loss Matrices with Applications to Subset Ranking Losses


similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(1, 0.012), (2, 0.046), (16, 0.038), (33, 0.14), (34, 0.077), (36, 0.011), (41, 0.024), (49, 0.051), (56, 0.176), (70, 0.035), (78, 0.182), (85, 0.051), (89, 0.04), (93, 0.034), (95, 0.02)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.8635568 230 nips-2013-Online Learning with Costly Features and Labels

Author: Navid Zolghadr, Gabor Bartok, Russell Greiner, András György, Csaba Szepesvari

Abstract: This paper introduces the online probing problem: In each round, the learner is able to purchase the values of a subset of feature values. After the learner uses this information to come up with a prediction for the given round, he then has the option of paying to see the loss function that he is evaluated against. Either way, the learner pays for both the errors of his predictions and also whatever he chooses to observe, including the cost of observing the loss function for the given round and the cost of the observed features. We consider two variations of this problem, depending on whether the learner can observe the label for free or not. We provide algorithms and upper and lower bounds on the regret for both variants. We show that a positive cost for observing the label significantly increases the regret of the problem. 1

2 0.85351133 63 nips-2013-Cluster Trees on Manifolds

Author: Sivaraman Balakrishnan, Srivatsan Narayanan, Alessandro Rinaldo, Aarti Singh, Larry Wasserman

Abstract: unkown-abstract

3 0.831725 144 nips-2013-Inverse Density as an Inverse Problem: the Fredholm Equation Approach

Author: Qichao Que, Mikhail Belkin

Abstract: q We address the problem of estimating the ratio p where p is a density function and q is another density, or, more generally an arbitrary function. Knowing or approximating this ratio is needed in various problems of inference and integration often referred to as importance sampling in statistical inference. It is also closely related to the problem of covariate shift in transfer learning. Our approach is based on reformulating the problem of estimating the ratio as an inverse problem in terms of an integral operator corresponding to a kernel, known as the Fredholm problem of the first kind. This formulation, combined with the techniques of regularization leads to a principled framework for constructing algorithms and for analyzing them theoretically. The resulting family of algorithms (FIRE, for Fredholm Inverse Regularized Estimator) is flexible, simple and easy to implement. We provide detailed theoretical analysis including concentration bounds and convergence rates for the Gaussian kernel for densities defined on Rd and smooth d-dimensional sub-manifolds of the Euclidean space. Model selection for unsupervised or semi-supervised inference is generally a difficult problem. It turns out that in the density ratio estimation setting, when samples from both distributions are available, simple completely unsupervised model selection methods are available. We call this mechanism CD-CV for Cross-Density Cross-Validation. We show encouraging experimental results including applications to classification within the covariate shift framework. 1

4 0.80074215 66 nips-2013-Computing the Stationary Distribution Locally

Author: Christina E. Lee, Asuman Ozdaglar, Devavrat Shah

Abstract: Computing the stationary distribution of a large finite or countably infinite state space Markov Chain (MC) has become central in many problems such as statistical inference and network analysis. Standard methods involve large matrix multiplications as in power iteration, or simulations of long random walks, as in Markov Chain Monte Carlo (MCMC). Power iteration is costly, as it involves computation at every state. For MCMC, it is difficult to determine whether the random walks are long enough to guarantee convergence. In this paper, we provide a novel algorithm that answers whether a chosen state in a MC has stationary probability larger than some ∆ ∈ (0, 1), and outputs an estimate of the stationary probability. Our algorithm is constant time, using information from a local neighborhood of the state on the graph induced by the MC, which has constant size relative to the state space. The multiplicative error of the estimate is upper bounded by a function of the mixing properties of the MC. Simulation results show MCs for which this method gives tight estimates. 1

5 0.79877049 228 nips-2013-Online Learning of Dynamic Parameters in Social Networks

Author: Shahin Shahrampour, Sasha Rakhlin, Ali Jadbabaie

Abstract: This paper addresses the problem of online learning in a dynamic setting. We consider a social network in which each individual observes a private signal about the underlying state of the world and communicates with her neighbors at each time period. Unlike many existing approaches, the underlying state is dynamic, and evolves according to a geometric random walk. We view the scenario as an optimization problem where agents aim to learn the true state while suffering the smallest possible loss. Based on the decomposition of the global loss function, we introduce two update mechanisms, each of which generates an estimate of the true state. We establish a tight bound on the rate of change of the underlying state, under which individuals can track the parameter with a bounded variance. Then, we characterize explicit expressions for the steady state mean-square deviation(MSD) of the estimates from the truth, per individual. We observe that only one of the estimators recovers the optimal MSD, which underscores the impact of the objective function decomposition on the learning quality. Finally, we provide an upper bound on the regret of the proposed methods, measured as an average of errors in estimating the parameter in a finite time. 1

6 0.7985217 240 nips-2013-Optimization, Learning, and Games with Predictable Sequences

7 0.79820603 125 nips-2013-From Bandits to Experts: A Tale of Domination and Independence

8 0.79455483 142 nips-2013-Information-theoretic lower bounds for distributed statistical estimation with communication constraints

9 0.79223049 249 nips-2013-Polar Operators for Structured Sparse Estimation

10 0.790959 252 nips-2013-Predictive PAC Learning and Process Decompositions

11 0.7904551 355 nips-2013-Which Space Partitioning Tree to Use for Search?

12 0.78788459 2 nips-2013-(Nearly) Optimal Algorithms for Private Online Learning in Full-information and Bandit Settings

13 0.7872023 273 nips-2013-Reinforcement Learning in Robust Markov Decision Processes

14 0.78639638 224 nips-2013-On the Sample Complexity of Subspace Learning

15 0.78505814 231 nips-2013-Online Learning with Switching Costs and Other Adaptive Adversaries

16 0.78488523 23 nips-2013-Active Learning for Probabilistic Hypotheses Using the Maximum Gibbs Error Criterion

17 0.78478664 227 nips-2013-Online Learning in Markov Decision Processes with Adversarially Chosen Transition Probability Distributions

18 0.78456986 29 nips-2013-Adaptive Submodular Maximization in Bandit Setting

19 0.78313899 95 nips-2013-Distributed Exploration in Multi-Armed Bandits

20 0.78308934 177 nips-2013-Local Privacy and Minimax Bounds: Sharp Rates for Probability Estimation