nips nips2009 nips2009-14 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Kamalika Chaudhuri, Yoav Freund, Daniel J. Hsu
Abstract: We study the problem of decision-theoretic online learning (DTOL). Motivated by practical applications, we focus on DTOL when the number of actions is very large. Previous algorithms for learning in this framework have a tunable learning rate parameter, and a barrier to using online-learning in practical applications is that it is not understood how to set this parameter optimally, particularly when the number of actions is large. In this paper, we offer a clean solution by proposing a novel and completely parameter-free algorithm for DTOL. We introduce a new notion of regret, which is more natural for applications with a large number of actions. We show that our algorithm achieves good performance with respect to this new notion of regret; in addition, it also achieves performance close to that of the best bounds achieved by previous algorithms with optimally-tuned parameters, according to previous notions of regret. 1
Reference: text
sentIndex sentText sentNum sentScore
1 Motivated by practical applications, we focus on DTOL when the number of actions is very large. [sent-7, score-0.462]
2 Previous algorithms for learning in this framework have a tunable learning rate parameter, and a barrier to using online-learning in practical applications is that it is not understood how to set this parameter optimally, particularly when the number of actions is large. [sent-8, score-0.561]
3 In this problem, a learner must assign probabilities to a fixed set of actions in a sequence of rounds. [sent-14, score-0.603]
4 After each assignment, each action incurs a loss (a value in [0, 1]); the learner incurs a loss equal to the expected loss of actions for that round, where the expectation is computed according to the learner’s current probability assignment. [sent-15, score-1.301]
5 The regret (of the learner) to an action is the difference between the learner’s cumulative loss and the cumulative loss of that action. [sent-16, score-1.165]
6 The goal of the learner is to achieve, on any sequence of losses, low regret to the action with the lowest cumulative loss (the best action). [sent-17, score-1.144]
7 To look at tracking in a DTOL framework, we set each action to be a path (sequence of states) over the state space. [sent-20, score-0.377]
8 The loss of an action at time t is the distance between the observation at time t and the state of the action at time t, and the goal of the learner is to predict a path which has loss close to that of the action with the lowest cumulative loss. [sent-21, score-1.386]
9 In Hedge, each action is assigned a probability, which depends on the cumulative loss of this action and a parameter η, also called the learning rate. [sent-23, score-0.846]
10 By appropriately setting the learning rate as a function of the iteration [6, 7] √ and the number of actions, Hedge can achieve a regret upper-bounded by O( T ln N ), for each iteration T , where N is the number of actions. [sent-24, score-0.626]
11 This bound on the regret is optimal as there is a √ Ω( T ln N ) lower-bound [5]. [sent-25, score-0.65]
12 In this paper, motivated by practical applications such as tracking, we consider DTOL in the regime where the number of actions N is very large. [sent-26, score-0.462]
13 Suppose each action is a point on a line, and the total losses are as given in the plot. [sent-29, score-0.412]
14 The regret to the top ǫ-quantile is the difference between the learner’s total loss and the total loss of the worst action in the indicated interval of measure ǫ. [sent-30, score-1.0]
15 setting η as a fixed function of the number of actions N . [sent-31, score-0.44]
16 A second issue with using online-learning in problems such as tracking, where N is very large, is that the regret to the best action is not an effective measure of performance. [sent-39, score-0.783]
17 For problems such as tracking, one expects to have a lot of actions that are close to the action with the lowest loss. [sent-40, score-0.762]
18 As these actions also have low loss, measuring performance with respect to a small group of actions that perform well is extremely reasonable – see, for example, Figure 1. [sent-41, score-0.895]
19 We order the cumulative losses of all actions from lowest to highest and define the regret of the learner to the top ǫ-quantile to be the difference between the cumulative loss of the learner and the ⌊ǫN ⌋-th element in the sorted list. [sent-43, score-1.638]
20 We prove that for NormalHedge, the regret to the top ǫ-quantile of actions is at most O T ln 1 + ln2 N ǫ , which holds simultaneously for all T and ǫ. [sent-44, score-1.076]
21 If we set ǫ = 1/N , we get that the regret to the best √ action is upper-bounded by O T ln N + ln2 N , which is only slightly worse than the bound achieved by Hedge with optimally-tuned parameters. [sent-45, score-0.976]
22 Notice that in our regret bound, the term involving T has no dependence on N . [sent-46, score-0.43]
23 Actions which have higher cumulative loss than the algorithm are assigned potential 1. [sent-50, score-0.325]
24 The weight assigned to an action in each round is then proportional to the derivative of its potential. [sent-51, score-0.491]
25 One can also interpret Hedge as a potential-based algorithm, and under this interpretation, the potential assigned by Hedge to action i is proportional to exp(ηRi,t ). [sent-52, score-0.395]
26 Find ct > 0 satisfying 1 N N i=1 ([Ri,t ]+ )2 2ct exp 5. [sent-65, score-0.266]
27 Another useful property of NormalHedge, which Hedge does not possess, is that it assigns zero weight to any action whose cumulative loss is larger than the cumulative loss of the algorithm itself. [sent-71, score-0.786]
28 In other words, non-zero weights are assigned only to actions which perform better than the algorithm. [sent-72, score-0.54]
29 In most applications, we expect a small set of the actions to perform significantly better than most of the actions. [sent-73, score-0.455]
30 The regret of the algorithm is guaranteed to be small, which means that the algorithm will perform better than most of the actions and thus assign them zero probability. [sent-74, score-0.954]
31 [9, 10] have proposed more recent solutions to DTOL in which the regret of Hedge to the best action is upper bounded by a function of L, the loss of the best action, or by a function of the variations in the losses. [sent-75, score-0.893]
32 We therefore leave open the question of finding an adaptive algorithm for DTOL which has regret upper-bounded by a function that depends on the loss of the best action. [sent-78, score-0.616]
33 In round t, the learner chooses a weight distribution pt = (p1,t , . [sent-88, score-0.277]
34 Each action i incurs a loss ℓi,t , and the learner incurs the expected loss under this distribution: N pi,t ℓi,t . [sent-95, score-0.757]
35 ℓA,t = i=1 The learner’s instantaneous regret to an action i in round t is ri,t = ℓA,t − ℓi,t , and its (cumulative) regret to an action i in the first t rounds is t ri,τ . [sent-96, score-1.621]
36 The goal of the learner is to minimize this cumulative regret Ri,t to any action i (in particular, the best action), for any value of t. [sent-100, score-1.011]
37 In addition to tracking the cumulative regrets Ri,t to each action i after each round t, the algorithm also maintains a scale parameter ct . [sent-104, score-0.903]
38 This is chosen so that the average of the potential, over all actions i, evaluated at Ri,t and ct , remains constant at e: 1 N N exp i=1 ([Ri,t ]+ )2 2ct = e. [sent-105, score-0.706]
39 (2) We observe that since φ(x, c) is convex in c > 0, we can determine ct with a line search. [sent-106, score-0.212]
40 The weight assigned to i in round t is set proportional to the first-derivative of the potential, evaluated at Ri,t−1 and ct−1 : pi,t ∝ ∂ φ(x, c) ∂x = x=Ri,t−1 ,c=ct−1 [Ri,t−1 ]+ exp ct−1 ([Ri,t−1 ]+ )2 2ct−1 . [sent-107, score-0.252]
41 Notice that the actions for which Ri,t−1 ≤ 0 receive zero weight in round t. [sent-108, score-0.579]
42 The main feature of our example is that the effective number of actions n (i. [sent-113, score-0.467]
43 the number of distinct actions) is smaller than the total number of actions N . [sent-115, score-0.461]
44 Notice that without prior knowledge of the actions and their loss-sequences, one cannot determine the effective number actions in advance; as a result, there is no direct method by which Hedge and Polynomial Weights could set their parameters as a function of n. [sent-116, score-0.907]
45 Our example attempts to model a practical scenario where one often finds multiple actions with loss-sequences which are almost identical. [sent-117, score-0.462]
46 Our example has four parameters: N , the total number of actions; n, the effective number of actions (the number of distinct actions); k, the (effective) number of good actions; and ǫ, which indicates how much better the good actions are compared to the rest. [sent-120, score-0.928]
47 ε,k The instantaneous losses of the N actions are represented by a N × T matrix BN ; the loss of action i in round t is the (i, t)-th entry in the matrix. [sent-122, score-1.066]
48 If the rows of An give the losses for n actions over time, then it is clear that on average, no action is better than any other. [sent-144, score-0.846]
49 Therefore for large enough T , for these losses, a typical algorithm will eventually assign all actions the same weight. [sent-145, score-0.504]
50 Now, when losses are given by Aε,k , the first k actions (the good actions) perform better than the n remaining n − k; so, for large enough T , a typical algorithm will eventually recognize this and assign the first k actions equal weights (giving little or no weight to the remaining n − k). [sent-167, score-1.132]
51 Finally, ε,k we artificially replicate each action (each row) N/n times to yield the final loss matrix BN for N actions: ε,k An Aε,k n ε,k BN = . [sent-168, score-0.397]
52 Aε,k n The replication of actions significantly affects the behavior of algorithms that set parameters with respect to the number of actions N , which is inflated compared to the effective number of actions n. [sent-172, score-1.53]
53 NormalHedge, having no such parameters, is completely unaffected by the replication of actions. [sent-173, score-0.203]
54 Exp is a time/variation-adaptive version of Hedge (exponential weights) due to [7] (roughly, ηt = O( (log N )/Vart ), where Vart is the cumulative loss variance). [sent-175, score-0.221]
55 Poly is polynomial weights [12, 11], which has a parameter p that is typically set as a function of the number of actions; we set p = 2 ln N as is recommended to guarantee a regret bound comparable to that of Hedge. [sent-176, score-0.729]
56 Figure 3 shows the regrets to the best action versus the replication factor N/n, where the effective number of actions n is held fixed. [sent-177, score-1.04]
57 Recall that Exp and Poly have parameters set with respect to the number of actions N . [sent-178, score-0.44]
58 We see from the figures that NormalHedge is completely unaffected by the replication of actions; no matter how many times the actions may be replicated, the performance of NormalHedge stays exactly the same. [sent-179, score-0.66]
59 In contrast, increasing the replication factor affects the performance of Exp and Poly: Exp and Poly become more sensitive to the changes in the total losses of the actions (e. [sent-180, score-0.743]
60 the base of the exponent in the weights assigned by Exp increases with N ); so when there are multiple good actions (i. [sent-182, score-0.525]
61 When k = 1, Exp and Poly actually perform better using the inflated value N (as opposed to n), as this causes the slight advantage of the single best action to be magnified. [sent-185, score-0.341]
62 We note that if the parameters of Exp and Poly were set to be a function of n, instead of N , then, then their performance would also not depend on the replication factor (the peformance would be the same as the N/n = 1 case). [sent-187, score-0.185]
63 5 Regret to best action after T=32768 Regret to best action after T=32768 400 350 300 250 Exp. [sent-189, score-0.652]
64 Normal 600 500 400 0 10 1 2 10 10 Replication factor k=8 k = 32 Figure 3: Regrets to the best action after T = 32768 rounds, versus replication factor N/n. [sent-197, score-0.517]
65 4 3 10 k=2 Regret to best action after T=32768 Regret to best expert after T=32768 k=1 700 10 Replication factor 900 800 2 10 Replication factor Related work There has been a large amount of literature on various aspects of DTOL. [sent-201, score-0.46]
66 The standard measure of regret in most of these works is the regret to the best action. [sent-203, score-0.893]
67 The original √ Hedge algorithm has a regret bound of O( T log N ). [sent-204, score-0.493]
68 As a result, its regret bound also holds only for a fixed T . [sent-206, score-0.471]
69 The algorithm of [13] guarantees a regret bound √ of O( T log N ) to the best action uniformly for all T by using a doubling trick. [sent-207, score-0.841]
70 Time-varying learning rates for exponential weights algorithms were considered in [6]; there, they show that if ηt √ 8 ln(N )/t, then using exponential weights with η = ηt in round t guarantees regret bounds = of 2T ln N + O(ln N ) for any T . [sent-208, score-0.918]
71 This bound provides a better regret to the best action than we do. [sent-209, score-0.797]
72 Though not explicitly considered in previous works, the exponential weights algorithms can be partly analyzed with respect to the regret to the top ǫ-quantile. [sent-212, score-0.547]
73 For any fixed ǫ, Hedge can be modified by setting η as a function of this ǫ such that the regret to the top ǫ-quantile is at most O( T log(1/ǫ)). [sent-213, score-0.457]
74 However, this procedure adds an additive O( T log log N ) factor to the regret to the ǫ quantile of actions, for any ǫ. [sent-218, score-0.454]
75 In contrast, our solution NormalHedge is clean and simple, and we guarantee a regret bound for all values of ǫ uniformly, without any extra overhead. [sent-220, score-0.495]
76 6 3 10 More recent work in [14, 7, 10] provide algorithms with significantly improved bounds when the total loss of the best action is small, or when the total variation in the losses is small. [sent-221, score-0.653]
77 Besides exponential weights, another important class of online learning algorithms are the polynomial weights algorithms studied in [12, 11, 8]. [sent-224, score-0.188]
78 These algorithms too require a parameter; this parameter does not depend on the number of rounds T , but depends crucially on the number of actions N . [sent-225, score-0.525]
79 The weight assigned to action i in round t is proportional to ([Ri,t−1 ]+ )p−1 for some p > 1; setting p = 2 ln N yields regret bounds of the form 2eT (ln N − 0. [sent-226, score-1.142]
80 Our algorithm and polynomial weights share the feature that zero weight is given to actions that are performing worse than the algorithm, although the degree of this weight sparsity is tied to the performance of the algorithm. [sent-228, score-0.599]
81 If Normal-Hedge has access to N actions, then for all loss sequences, for all t, for all 0 < ǫ ≤ 1 and for all 0 < δ ≤ 1/2, the regret of the algorithm to the top ǫ-quantile of the actions is at most 16 ln2 N 10. [sent-233, score-1.041]
82 δ δ In particular, with ǫ = 1/N , the regret to the best action is at most (1 + ln N ) 3(1 + 50δ)t + 16 ln2 N 10. [sent-235, score-0.935]
83 If Normal-Hedge has access to N actions, then, as t → ∞, the regret of NormalHedge to the top ǫ-quantile of actions approaches an upper bound of 3t(1 + ln(1/ǫ)) + o(t) . [sent-240, score-0.956]
84 In particular, the regret of Normal-Hedge to the best action approaches an upper bound of of 3t(1 + ln N ) + o(t) . [sent-241, score-0.976]
85 2 Regret bounds from the potential equation The following lemma relates the performance of the algorithm at time t to the scale ct . [sent-244, score-0.374]
86 At any time t, the regret to the best action can be bounded as max Ri,t ≤ i 2ct (ln N + 1) . [sent-246, score-0.756]
87 Moreover, for any 0 ≤ ǫ ≤ 1 and any t, the regret to the top ǫ-quantile of actions is at most 2ct (ln(1/ǫ) + 1) . [sent-247, score-0.897]
88 We use Et to denote the actions that have non-zero weight on iteration t. [sent-249, score-0.469]
89 The first part of the lemma follows from the fact that, for any action i ∈ Et , exp (Ri,t )2 2ct which implies Ri,t ≤ N ([Ri,t ]+ )2 2ct = exp ≤ exp i′ =1 ([Ri′ ,t ]+ )2 2ct ≤ Ne 2ct (ln N + 1). [sent-250, score-0.51]
90 For the second part of the lemma, let Ri,t denote the regret of our algorithm to the action with the ǫN -th highest regret. [sent-251, score-0.745]
91 Then, the total potential of the actions with regrets greater than or equal to Ri,t is at least ([Ri,t ]+ )2 ≤ Ne ǫN exp 2ct from which the second part of the lemma follows. [sent-252, score-0.693]
92 3 Bounds on the scale ct and the proof of Theorem 1 In Lemmas 4 and 5, we bound the growth of the scale ct as a function of the time t. [sent-254, score-0.512]
93 As ct increases monotonically with t, we can divide the rounds t into two phases, t < t0 and t ≥ t0 , where t0 is the first time such that ct0 ≥ 4 ln2 N 16 ln N + , δ δ3 for some fixed δ ∈ (0, 1/2). [sent-256, score-0.452]
94 We then show bounds on the growth of ct for each phase separately. [sent-257, score-0.305]
95 Lemma 4 shows that ct is not too large at the end of the first phase, while Lemma 5 bounds the per-round growth of ct in the second phase. [sent-258, score-0.495]
96 Suppose that at some time t0 , ct0 ≥ 4 lnδ N + 16 δ3 N , where 0 ≤ δ ≤ Then, for any time t ≥ t0 , 3 ct+1 − ct ≤ (1 + 49. [sent-263, score-0.212]
97 Let t0 be the first time at which ct0 ≥ 4 ln2 N δ + 16 ln N δ3 . [sent-268, score-0.179]
98 Then, from Lemma 4, ct0 ≤ 2ct0 −1 (1 + ln N ) + 3, which is at most 32 ln N 8 ln3 N 34 ln2 N 81 ln2 N 8 ln3 N + +3≤ . [sent-269, score-0.358]
99 By Lemma 5, we have that for any t ≥ t0 , ct ≤ 3 (1 + 49. [sent-271, score-0.212]
100 2 Combining these last two inequalities yields ct ≤ 8 ln3 N 81 ln2 N 3 (1 + 49. [sent-273, score-0.212]
wordName wordTfidf (topN-words)
[('actions', 0.44), ('regret', 0.43), ('hedge', 0.402), ('action', 0.293), ('normalhedge', 0.238), ('ct', 0.212), ('dtol', 0.201), ('ln', 0.179), ('replication', 0.143), ('learner', 0.138), ('cumulative', 0.117), ('round', 0.11), ('loss', 0.104), ('poly', 0.103), ('losses', 0.098), ('regrets', 0.08), ('tracking', 0.069), ('incurs', 0.059), ('freund', 0.056), ('lemma', 0.055), ('exp', 0.054), ('expert', 0.053), ('weights', 0.046), ('littlestone', 0.044), ('rounds', 0.044), ('potential', 0.043), ('online', 0.042), ('bounds', 0.042), ('unaffected', 0.041), ('bound', 0.041), ('assigned', 0.039), ('diego', 0.033), ('notion', 0.033), ('best', 0.033), ('polynomial', 0.033), ('lemmas', 0.032), ('chaudhuri', 0.032), ('kamalika', 0.032), ('suboptimality', 0.032), ('bn', 0.03), ('ated', 0.029), ('copy', 0.029), ('cse', 0.029), ('degradation', 0.029), ('lowest', 0.029), ('growth', 0.029), ('weight', 0.029), ('adaptive', 0.027), ('effective', 0.027), ('uc', 0.027), ('top', 0.027), ('barrier', 0.026), ('assign', 0.025), ('factor', 0.024), ('clean', 0.024), ('sharper', 0.024), ('algorithms', 0.023), ('game', 0.022), ('copies', 0.022), ('uniformly', 0.022), ('practical', 0.022), ('phase', 0.022), ('algorithm', 0.022), ('proposing', 0.021), ('exponential', 0.021), ('san', 0.021), ('total', 0.021), ('instantaneous', 0.021), ('perturbed', 0.02), ('hadamard', 0.02), ('proportional', 0.02), ('normal', 0.02), ('completely', 0.019), ('impractical', 0.019), ('prediction', 0.019), ('phases', 0.018), ('access', 0.018), ('proof', 0.018), ('schapire', 0.018), ('variation', 0.018), ('depend', 0.018), ('eventually', 0.017), ('affects', 0.017), ('theorem', 0.017), ('matter', 0.017), ('rate', 0.017), ('understood', 0.017), ('divide', 0.017), ('sciences', 0.016), ('games', 0.016), ('djhsu', 0.016), ('hedging', 0.016), ('ita', 0.016), ('tunable', 0.016), ('vart', 0.016), ('state', 0.015), ('notice', 0.015), ('rows', 0.015), ('perform', 0.015), ('exacerbated', 0.015)]
simIndex simValue paperId paperTitle
same-paper 1 0.99999982 14 nips-2009-A Parameter-free Hedging Algorithm
Author: Kamalika Chaudhuri, Yoav Freund, Daniel J. Hsu
Abstract: We study the problem of decision-theoretic online learning (DTOL). Motivated by practical applications, we focus on DTOL when the number of actions is very large. Previous algorithms for learning in this framework have a tunable learning rate parameter, and a barrier to using online-learning in practical applications is that it is not understood how to set this parameter optimally, particularly when the number of actions is large. In this paper, we offer a clean solution by proposing a novel and completely parameter-free algorithm for DTOL. We introduce a new notion of regret, which is more natural for applications with a large number of actions. We show that our algorithm achieves good performance with respect to this new notion of regret; in addition, it also achieves performance close to that of the best bounds achieved by previous algorithms with optimally-tuned parameters, according to previous notions of regret. 1
2 0.18920895 156 nips-2009-Monte Carlo Sampling for Regret Minimization in Extensive Games
Author: Marc Lanctot, Kevin Waugh, Martin Zinkevich, Michael Bowling
Abstract: Sequential decision-making with multiple agents and imperfect information is commonly modeled as an extensive game. One efficient method for computing Nash equilibria in large, zero-sum, imperfect information games is counterfactual regret minimization (CFR). In the domain of poker, CFR has proven effective, particularly when using a domain-specific augmentation involving chance outcome sampling. In this paper, we describe a general family of domain-independent CFR sample-based algorithms called Monte Carlo counterfactual regret minimization (MCCFR) of which the original and poker-specific versions are special cases. We start by showing that MCCFR performs the same regret updates as CFR on expectation. Then, we introduce two sampling schemes: outcome sampling and external sampling, showing that both have bounded overall regret with high probability. Thus, they can compute an approximate equilibrium using self-play. Finally, we prove a new tighter bound on the regret for the original CFR algorithm and relate this new bound to MCCFR’s bounds. We show empirically that, although the sample-based algorithms require more iterations, their lower cost per iteration can lead to dramatically faster convergence in various games. 1
3 0.18029015 178 nips-2009-On Stochastic and Worst-case Models for Investing
Author: Elad Hazan, Satyen Kale
Abstract: In practice, most investing is done assuming a probabilistic model of stock price returns known as the Geometric Brownian Motion (GBM). While often an acceptable approximation, the GBM model is not always valid empirically. This motivates a worst-case approach to investing, called universal portfolio management, where the objective is to maximize wealth relative to the wealth earned by the best fixed portfolio in hindsight. In this paper we tie the two approaches, and design an investment strategy which is universal in the worst-case, and yet capable of exploiting the mostly valid GBM model. Our method is based on new and improved regret bounds for online convex optimization with exp-concave loss functions. 1
4 0.16708125 45 nips-2009-Beyond Convexity: Online Submodular Minimization
Author: Elad Hazan, Satyen Kale
Abstract: We consider an online decision problem over a discrete space in which the loss function is submodular. We give algorithms which are computationally efficient and are Hannan-consistent in both the full information and bandit settings. 1
5 0.16462985 24 nips-2009-Adapting to the Shifting Intent of Search Queries
Author: Umar Syed, Aleksandrs Slivkins, Nina Mishra
Abstract: Search engines today present results that are often oblivious to recent shifts in intent. For example, the meaning of the query ‘independence day’ shifts in early July to a US holiday and to a movie around the time of the box office release. While no studies exactly quantify the magnitude of intent-shifting traffic, studies suggest that news events, seasonal topics, pop culture, etc account for 1/2 the search queries. This paper shows that the signals a search engine receives can be used to both determine that a shift in intent happened, as well as find a result that is now more relevant. We present a meta-algorithm that marries a classifier with a bandit algorithm to achieve regret that depends logarithmically on the number of query impressions, under certain assumptions. We provide strong evidence that this regret is close to the best achievable. Finally, via a series of experiments, we demonstrate that our algorithm outperforms prior approaches, particularly as the amount of intent-shifting traffic increases. 1
6 0.13370809 250 nips-2009-Training Factor Graphs with Reinforcement Learning for Efficient MAP Inference
7 0.12869009 113 nips-2009-Improving Existing Fault Recovery Policies
8 0.11132964 53 nips-2009-Complexity of Decentralized Control: Special Cases
9 0.10628275 242 nips-2009-The Infinite Partially Observable Markov Decision Process
10 0.098401427 220 nips-2009-Slow Learners are Fast
11 0.092299692 1 nips-2009-$L 1$-Penalized Robust Estimation for a Class of Inverse Problems Arising in Multiview Geometry
12 0.080415703 221 nips-2009-Solving Stochastic Games
13 0.073388159 73 nips-2009-Dual Averaging Method for Regularized Stochastic Learning and Online Optimization
14 0.068646297 199 nips-2009-Ranking Measures and Loss Functions in Learning to Rank
15 0.06838841 61 nips-2009-Convex Relaxation of Mixture Regression with Efficient Algorithms
16 0.066418931 12 nips-2009-A Generalized Natural Actor-Critic Algorithm
17 0.065877445 161 nips-2009-Nash Equilibria of Static Prediction Games
18 0.061575726 141 nips-2009-Local Rules for Global MAP: When Do They Work ?
19 0.061393064 76 nips-2009-Efficient Learning using Forward-Backward Splitting
20 0.06136246 181 nips-2009-Online Learning of Assignments
topicId topicWeight
[(0, -0.138), (1, 0.148), (2, 0.169), (3, -0.188), (4, -0.063), (5, 0.149), (6, -0.081), (7, -0.029), (8, 0.039), (9, 0.03), (10, -0.063), (11, -0.061), (12, -0.072), (13, 0.006), (14, -0.012), (15, 0.091), (16, -0.012), (17, 0.037), (18, 0.073), (19, -0.019), (20, -0.009), (21, 0.002), (22, -0.027), (23, -0.114), (24, 0.051), (25, -0.009), (26, 0.067), (27, -0.03), (28, 0.177), (29, -0.003), (30, -0.158), (31, -0.021), (32, 0.139), (33, 0.001), (34, -0.169), (35, 0.133), (36, 0.03), (37, -0.026), (38, -0.098), (39, 0.019), (40, 0.083), (41, 0.023), (42, 0.049), (43, -0.112), (44, -0.025), (45, -0.058), (46, 0.1), (47, -0.132), (48, -0.036), (49, 0.052)]
simIndex simValue paperId paperTitle
same-paper 1 0.96463233 14 nips-2009-A Parameter-free Hedging Algorithm
Author: Kamalika Chaudhuri, Yoav Freund, Daniel J. Hsu
Abstract: We study the problem of decision-theoretic online learning (DTOL). Motivated by practical applications, we focus on DTOL when the number of actions is very large. Previous algorithms for learning in this framework have a tunable learning rate parameter, and a barrier to using online-learning in practical applications is that it is not understood how to set this parameter optimally, particularly when the number of actions is large. In this paper, we offer a clean solution by proposing a novel and completely parameter-free algorithm for DTOL. We introduce a new notion of regret, which is more natural for applications with a large number of actions. We show that our algorithm achieves good performance with respect to this new notion of regret; in addition, it also achieves performance close to that of the best bounds achieved by previous algorithms with optimally-tuned parameters, according to previous notions of regret. 1
2 0.6609503 24 nips-2009-Adapting to the Shifting Intent of Search Queries
Author: Umar Syed, Aleksandrs Slivkins, Nina Mishra
Abstract: Search engines today present results that are often oblivious to recent shifts in intent. For example, the meaning of the query ‘independence day’ shifts in early July to a US holiday and to a movie around the time of the box office release. While no studies exactly quantify the magnitude of intent-shifting traffic, studies suggest that news events, seasonal topics, pop culture, etc account for 1/2 the search queries. This paper shows that the signals a search engine receives can be used to both determine that a shift in intent happened, as well as find a result that is now more relevant. We present a meta-algorithm that marries a classifier with a bandit algorithm to achieve regret that depends logarithmically on the number of query impressions, under certain assumptions. We provide strong evidence that this regret is close to the best achievable. Finally, via a series of experiments, we demonstrate that our algorithm outperforms prior approaches, particularly as the amount of intent-shifting traffic increases. 1
3 0.62036109 156 nips-2009-Monte Carlo Sampling for Regret Minimization in Extensive Games
Author: Marc Lanctot, Kevin Waugh, Martin Zinkevich, Michael Bowling
Abstract: Sequential decision-making with multiple agents and imperfect information is commonly modeled as an extensive game. One efficient method for computing Nash equilibria in large, zero-sum, imperfect information games is counterfactual regret minimization (CFR). In the domain of poker, CFR has proven effective, particularly when using a domain-specific augmentation involving chance outcome sampling. In this paper, we describe a general family of domain-independent CFR sample-based algorithms called Monte Carlo counterfactual regret minimization (MCCFR) of which the original and poker-specific versions are special cases. We start by showing that MCCFR performs the same regret updates as CFR on expectation. Then, we introduce two sampling schemes: outcome sampling and external sampling, showing that both have bounded overall regret with high probability. Thus, they can compute an approximate equilibrium using self-play. Finally, we prove a new tighter bound on the regret for the original CFR algorithm and relate this new bound to MCCFR’s bounds. We show empirically that, although the sample-based algorithms require more iterations, their lower cost per iteration can lead to dramatically faster convergence in various games. 1
4 0.45759651 45 nips-2009-Beyond Convexity: Online Submodular Minimization
Author: Elad Hazan, Satyen Kale
Abstract: We consider an online decision problem over a discrete space in which the loss function is submodular. We give algorithms which are computationally efficient and are Hannan-consistent in both the full information and bandit settings. 1
5 0.45011118 113 nips-2009-Improving Existing Fault Recovery Policies
Author: Guy Shani, Christopher Meek
Abstract: An automated recovery system is a key component in a large data center. Such a system typically employs a hand-made controller created by an expert. While such controllers capture many important aspects of the recovery process, they are often not systematically optimized to reduce costs such as server downtime. In this paper we describe a passive policy learning approach for improving existing recovery policies without exploration. We explain how to use data gathered from the interactions of the hand-made controller with the system, to create an improved controller. We suggest learning an indefinite horizon Partially Observable Markov Decision Process, a model for decision making under uncertainty, and solve it using a point-based algorithm. We describe the complete process, starting with data gathering, model learning, model checking procedures, and computing a policy. 1
6 0.44569841 178 nips-2009-On Stochastic and Worst-case Models for Investing
7 0.42176393 220 nips-2009-Slow Learners are Fast
8 0.40850526 73 nips-2009-Dual Averaging Method for Regularized Stochastic Learning and Online Optimization
9 0.38692641 12 nips-2009-A Generalized Natural Actor-Critic Algorithm
10 0.37125349 134 nips-2009-Learning to Explore and Exploit in POMDPs
11 0.35185811 76 nips-2009-Efficient Learning using Forward-Backward Splitting
12 0.34971166 181 nips-2009-Online Learning of Assignments
13 0.33973974 193 nips-2009-Potential-Based Agnostic Boosting
14 0.33817768 1 nips-2009-$L 1$-Penalized Robust Estimation for a Class of Inverse Problems Arising in Multiview Geometry
15 0.33761877 221 nips-2009-Solving Stochastic Games
16 0.32450223 53 nips-2009-Complexity of Decentralized Control: Special Cases
17 0.31532693 250 nips-2009-Training Factor Graphs with Reinforcement Learning for Efficient MAP Inference
18 0.31498343 98 nips-2009-From PAC-Bayes Bounds to KL Regularization
19 0.30135173 242 nips-2009-The Infinite Partially Observable Markov Decision Process
20 0.29998139 233 nips-2009-Streaming Pointwise Mutual Information
topicId topicWeight
[(24, 0.11), (25, 0.057), (35, 0.04), (36, 0.073), (39, 0.037), (58, 0.092), (61, 0.01), (65, 0.352), (71, 0.035), (81, 0.03), (86, 0.036), (91, 0.014)]
simIndex simValue paperId paperTitle
same-paper 1 0.72592884 14 nips-2009-A Parameter-free Hedging Algorithm
Author: Kamalika Chaudhuri, Yoav Freund, Daniel J. Hsu
Abstract: We study the problem of decision-theoretic online learning (DTOL). Motivated by practical applications, we focus on DTOL when the number of actions is very large. Previous algorithms for learning in this framework have a tunable learning rate parameter, and a barrier to using online-learning in practical applications is that it is not understood how to set this parameter optimally, particularly when the number of actions is large. In this paper, we offer a clean solution by proposing a novel and completely parameter-free algorithm for DTOL. We introduce a new notion of regret, which is more natural for applications with a large number of actions. We show that our algorithm achieves good performance with respect to this new notion of regret; in addition, it also achieves performance close to that of the best bounds achieved by previous algorithms with optimally-tuned parameters, according to previous notions of regret. 1
2 0.58098561 70 nips-2009-Discriminative Network Models of Schizophrenia
Author: Irina Rish, Benjamin Thyreau, Bertrand Thirion, Marion Plaze, Marie-laure Paillere-martinot, Catherine Martelli, Jean-luc Martinot, Jean-baptiste Poline, Guillermo A. Cecchi
Abstract: Schizophrenia is a complex psychiatric disorder that has eluded a characterization in terms of local abnormalities of brain activity, and is hypothesized to affect the collective, “emergent” working of the brain. We propose a novel data-driven approach to capture emergent features using functional brain networks [4] extracted from fMRI data, and demonstrate its advantage over traditional region-of-interest (ROI) and local, task-specific linear activation analyzes. Our results suggest that schizophrenia is indeed associated with disruption of global brain properties related to its functioning as a network, which cannot be explained by alteration of local activation patterns. Moreover, further exploitation of interactions by sparse Markov Random Field classifiers shows clear gain over linear methods, such as Gaussian Naive Bayes and SVM, allowing to reach 86% accuracy (over 50% baseline - random guess), which is quite remarkable given that it is based on a single fMRI experiment using a simple auditory task. 1
3 0.53553444 167 nips-2009-Non-Parametric Bayesian Dictionary Learning for Sparse Image Representations
Author: Mingyuan Zhou, Haojun Chen, Lu Ren, Guillermo Sapiro, Lawrence Carin, John W. Paisley
Abstract: Non-parametric Bayesian techniques are considered for learning dictionaries for sparse image representations, with applications in denoising, inpainting and compressive sensing (CS). The beta process is employed as a prior for learning the dictionary, and this non-parametric method naturally infers an appropriate dictionary size. The Dirichlet process and a probit stick-breaking process are also considered to exploit structure within an image. The proposed method can learn a sparse dictionary in situ; training images may be exploited if available, but they are not required. Further, the noise variance need not be known, and can be nonstationary. Another virtue of the proposed method is that sequential inference can be readily employed, thereby allowing scaling to large images. Several example results are presented, using both Gibbs and variational Bayesian inference, with comparisons to other state-of-the-art approaches. 1
4 0.47843316 250 nips-2009-Training Factor Graphs with Reinforcement Learning for Efficient MAP Inference
Author: Khashayar Rohanimanesh, Sameer Singh, Andrew McCallum, Michael J. Black
Abstract: Large, relational factor graphs with structure defined by first-order logic or other languages give rise to notoriously difficult inference problems. Because unrolling the structure necessary to represent distributions over all hypotheses has exponential blow-up, solutions are often derived from MCMC. However, because of limitations in the design and parameterization of the jump function, these samplingbased methods suffer from local minima—the system must transition through lower-scoring configurations before arriving at a better MAP solution. This paper presents a new method of explicitly selecting fruitful downward jumps by leveraging reinforcement learning (RL). Rather than setting parameters to maximize the likelihood of the training data, parameters of the factor graph are treated as a log-linear function approximator and learned with methods of temporal difference (TD); MAP inference is performed by executing the resulting policy on held out test data. Our method allows efficient gradient updates since only factors in the neighborhood of variables affected by an action need to be computed—we bypass the need to compute marginals entirely. Our method yields dramatic empirical success, producing new state-of-the-art results on a complex joint model of ontology alignment, with a 48% reduction in error over state-of-the-art in that domain. 1
5 0.46021727 116 nips-2009-Information-theoretic lower bounds on the oracle complexity of convex optimization
Author: Alekh Agarwal, Martin J. Wainwright, Peter L. Bartlett, Pradeep K. Ravikumar
Abstract: Despite a large literature on upper bounds on complexity of convex optimization, relatively less attention has been paid to the fundamental hardness of these problems. Given the extensive use of convex optimization in machine learning and statistics, gaining a understanding of these complexity-theoretic issues is important. In this paper, we study the complexity of stochastic convex optimization in an oracle model of computation. We improve upon known results and obtain tight minimax complexity estimates for various function classes. We also discuss implications of these results for the understanding the inherent complexity of large-scale learning and estimation problems. 1
7 0.44084167 147 nips-2009-Matrix Completion from Noisy Entries
8 0.43874452 55 nips-2009-Compressed Least-Squares Regression
9 0.43790683 94 nips-2009-Fast Learning from Non-i.i.d. Observations
10 0.4363938 122 nips-2009-Label Selection on Graphs
11 0.43558174 45 nips-2009-Beyond Convexity: Online Submodular Minimization
12 0.43512577 161 nips-2009-Nash Equilibria of Static Prediction Games
13 0.43455219 239 nips-2009-Submodularity Cuts and Applications
14 0.43261704 15 nips-2009-A Rate Distortion Approach for Semi-Supervised Conditional Random Fields
15 0.43166465 163 nips-2009-Neurometric function analysis of population codes
16 0.43061218 221 nips-2009-Solving Stochastic Games
17 0.43046093 173 nips-2009-Nonparametric Greedy Algorithms for the Sparse Learning Problem
18 0.4297325 91 nips-2009-Fast, smooth and adaptive regression in metric spaces
19 0.42736572 78 nips-2009-Efficient Moments-based Permutation Tests
20 0.42717484 180 nips-2009-On the Convergence of the Concave-Convex Procedure