nips nips2010 nips2010-226 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Jacob D. Abernethy, Manfred K. Warmuth
Abstract: We study repeated zero-sum games against an adversary on a budget. Given that an adversary has some constraint on the sequence of actions that he plays, we consider what ought to be the player’s best mixed strategy with knowledge of this budget. We show that, for a general class of normal-form games, the minimax strategy is indeed efficiently computable and relies on a “random playout” technique. We give three diverse applications of this new algorithmic template: a cost-sensitive “Hedge” setting, a particular problem in Metrical Task Systems, and the design of combinatorial prediction markets. 1
Reference: text
sentIndex sentText sentNum sentScore
1 edu Abstract We study repeated zero-sum games against an adversary on a budget. [sent-6, score-0.497]
2 Given that an adversary has some constraint on the sequence of actions that he plays, we consider what ought to be the player’s best mixed strategy with knowledge of this budget. [sent-7, score-0.422]
3 After each round we learn the true outcome and, hence, which experts predicted correctly or incorrectly. [sent-13, score-0.201]
4 Another way to interpret this sequential prediction model is to treat it as a repeated two-player zero-sum game against an adversary on a budget; that is, the adversary’s sequence of actions is restricted in that play ceases once the adversary exceeds the budget. [sent-16, score-0.982]
5 In the present paper, we develop a general framework for repeated game-playing against an adversary on a budget, and we provide a simple randomized strategy for the learner/player for a particular class of these games. [sent-18, score-0.435]
6 Roughly speaking, a random playout in an extensive-form game is a way to measure the likely outcome at a given state by finishing the game randomly from this state. [sent-20, score-0.538]
7 Random playouts, often known simply as Monte Carlo methods, have become particularly popular for solving the game of Go [5], which has led to much follow-up work for general games [12, 11]. [sent-21, score-0.27]
8 The Budgeted Adversary game we consider also involves exponentially large state spaces, yet we achieve efficiency using these random playouts. [sent-22, score-0.199]
9 The key result of this paper is that the proposed random playout is not simply a good heuristic, it is indeed minimax optimal for the games we consider. [sent-23, score-0.37]
10 1 Abernethy et al [1] was the first to use a random playout strategy to optimally solve an adversarial learning problem, namely for the case of the so-called Hedge Setting introduced by Freund and Schapire [10]. [sent-27, score-0.228]
11 In the standard Hedge setting, it is assumed that each expert suffers a cost in [0, 1] on each round. [sent-33, score-0.307]
12 But a surprisingly-overlooked case is when the cost ranges differ, where expert i may suffer per-round cost in [0, ci ] for some fixed ci > 0. [sent-34, score-0.699]
13 The MTS problem is decision/learning problem similar to the Hedge Setting above but with an added difficulty: the learner is required to pay the cost of moving through a given metric space. [sent-38, score-0.392]
14 Combinatorial Prediction Market Design: There has been a great deal of work in designing socalled prediction markets, where bettors may purchase contracts that pay off when the outcome of a future event is correctly guessed. [sent-41, score-0.386]
15 One important goal of such markets is to minimize the potential risk of the “market maker” who sells the contracts and pays the winning bettors. [sent-42, score-0.425]
16 1 The Setting: Budgeted Adversary Games We will now describe the generic sequential decision problem, where a problem instance is characterized by the following triple: an n × n loss matrix M , a monotonic “cost function” cost : [n]∗ → R+ , and a cost budget k. [sent-67, score-0.718]
17 A cost function is monotonic as long as it satisfies the relation cost(ρσ) ≤ cost(ρiσ) for all ρ, σ ∈ [n]∗ and all i ∈ [n]. [sent-68, score-0.268]
18 On each round t, the player chooses a distribution wt ∈ ∆n over his action space. [sent-70, score-0.268]
19 The game proceeds until the first round in which the budget is spent, i. [sent-76, score-0.422]
20 The goal of the Player is to choose each wt in order to minimize the total cost of this repeated game on all sequences of outcomes. [sent-85, score-0.459]
21 Note, importantly, that the player can learn from the past, and hence would like an efficiently computable function w : [n]∗ → ∆n , where on round t the player is given ρt−1 = (i1 . [sent-86, score-0.376]
22 We can define the worst-case cost of an algorithm 2 w : [n]∗ → ∆n by its performance against a worst-case sequence, that is T WorstCaseLoss(w; M, cost, k) := max ρ = i1 i2 . [sent-90, score-0.245]
23 ∈ [n]∗ cost(ρT −1 ) ≤ k < cost(ρT ) w(ρt−1 ) M eit . [sent-93, score-0.23]
24 ∈ [n]∗ cost(ρT −1 ) ≤ k < cost(ρT ) w(ρt−1 ) M eit . [sent-99, score-0.23]
25 Also, since we assume that the cost increases, it is reasonable to require that the distribution over the length, i. [sent-114, score-0.216]
26 , Φ(ρ, n) Let: set w(ρ) with values wi (ρ) = Φ(ρ)−Φ(ρ,i) ci 4 Minimax Optimality Now we prove that Algorithm 2 is both “legal” and minimax optimal. [sent-133, score-0.305]
27 Hence, wi (ρ) Φ(ρ) − Φ(ρi) = ci = i i = 1/ci i where the last equality holds because qi = 1/ci i 1 + i 1/ci Φ(ρi) ci Φ(ρ) − i qi Φ(ρi) − i i Φ(ρi) ci = 1 P1/ci . [sent-146, score-0.451]
28 , cn ), Algorithm 2 is minimax optimal for the Budgeted Adversary problem. [sent-152, score-0.185]
29 iT , the total cost of Algorithm 2 will be T T T wit (ρt−1 )cit = w(ρt−1 ) M eit = t=1 t=1 t=1 Φ(ρt−1 ) − Φ(ρt ) cit = Φ(∅) − Φ(ρT ) ≤ Φ(∅) cit and hence the total cost of the algorithm is always bounded by Φ(∅). [sent-159, score-0.946]
30 On the other hand, we claim that Φ(∅) can always be achieved by an adversary for any algorithm w (·). [sent-160, score-0.369]
31 Given that ρt−1 has been constructed so far, select any coordinate it ∈ [n] for which wit (ρt−1 ) ≤ wit (ρt−1 ), that is, where the the algorithm w places at least as much weight on it as the proposed algorithm w we defined in Algorithm 2. [sent-162, score-0.26]
32 Continue constructing ρ until the budget is reached, i. [sent-165, score-0.196]
33 Now, let us check the loss of w on this sequence ρ: T T t=1 T wit (ρt−1 )cit ≥ w (ρt−1 ) M eit = t=1 wit (ρt−1 )cit = Φ(∅) − Φ(ρ) = Φ(∅) t=1 Hence, an adversary can achieve at least Φ(∅) loss for any algorithm w . [sent-168, score-0.93]
34 1 under a somewhat limited scope: only for diagonal matrices M , known budget k and cost(). [sent-171, score-0.196]
35 First, the concept of a cost() function and a budget k is not entirely necessary. [sent-174, score-0.196]
36 Indeed, we can redefine the Budgeted Adversary game in terms of an arbitrary stopping criterion δ : [n]∗ → {0, 1}, where δ(ρ) = 0 is equivalent to “the budget has been exceeded”. [sent-175, score-0.374]
37 This alternative budget interpretation lets us consider the sequence ρ as a path through 4 a game tree. [sent-177, score-0.399]
38 Second, we need not assume that the budget k, or even the generalized stopping criterion δ(), is known in advance. [sent-180, score-0.224]
39 Instead, we can work with the following generalization: the stopping criterion δ is drawn from a known prior λ and given to the adversary before the start of the game. [sent-181, score-0.368]
40 Fourth, we have only discussed minimizing loss against a budgeted adversary. [sent-192, score-0.364]
41 But all the results can be extended easily to the case where the player is instead maximizing gain (and the adversary is minimizing). [sent-193, score-0.504]
42 A particularly surprising result is that the minimax strategy is identical in either case; that is, the the recursive definition of wi (ρ) is the same whether the player is maximizing or minimizing. [sent-194, score-0.41]
43 For example in the expert setting, the game stops when all experts have cost larger than k versus at least one expert has gain at least k. [sent-196, score-0.608]
44 Therefore for the same budget size k, the minimax value of the gain version is typically smaller than the value of the loss version. [sent-197, score-0.388]
45 The minimax algorithm for this special case was already thoroughly developed by Abernethy et al [1]. [sent-210, score-0.254]
46 A learner must predict a sequence of distributions wt ∈ ∆n , and receive a sequence of loss vectors t ∈ {0, 1}n . [sent-213, score-0.233]
47 The total loss to the learner is t wt · t , and the game ceases only once the best expert has more than k errors, i. [sent-214, score-0.402]
48 wt · cost(s) = min si 1 i t t = wt M eit t The proposed reduction almost works, except for one key issue: this only allows cost vectors of the form t = M eit = eit , since by definition Nature chooses columns of M . [sent-223, score-1.067]
49 In the Hedge game, the worst case adversary always chooses t ∈ {e1 , . [sent-227, score-0.34]
50 Abernethy et al [1] provide the minimax optimal algorithm, but state the bound in terms of an expected length of a random walk. [sent-233, score-0.274]
51 This is essentially equivalent to our description of the minimax cost in terms of Φ(∅). [sent-234, score-0.37]
52 Ideally, we would like an algorithm and a bound that can handle non-uniform cost ranges, i. [sent-236, score-0.245]
53 where expert i suffers loss in some range [0, ci ]. [sent-238, score-0.217]
54 The simplest trick, which is just to take cmax := maxi ci , leads to a bound of √ the form k + 2cmax k ln n + cmax ln n which we know to be very loose. [sent-240, score-0.277]
55 , cn ) and cost(s) = mini si ci gives us immediately an optimal algorithm that, by Theorem 4. [sent-245, score-0.223]
56 According to the same theorem, the minimax loss bound is simply Φ(∅) which, unfortunately, is in terms of a random walk length. [sent-247, score-0.192]
57 ) is presented with a finite metric space and on each of a sequence of rounds will occupy a single state (or point) within this metric space. [sent-251, score-0.248]
58 At the beginning of each round the player is presented with a cost vector, describing the cost of occupying each point in the metric space. [sent-252, score-0.717]
59 The player has the option to remain at the his present state and pay this states associated cost, or he can decide to switch to another point in the metric and pay the cost of the new state. [sent-253, score-0.642]
60 In the latter case, however, the player must also pay the switching cost which is exactly the metric distance between the two points. [sent-254, score-0.523]
61 Notice that, were we given a sequence of cost vectors in advance, we could compute the optimal path of the algorithm that minimized total cost. [sent-259, score-0.298]
62 What we would like is an algorithm that performs well relative to the offline cost without knowledge of the sequence of cost vectors. [sent-261, score-0.514]
63 The standard measure of performance for an online algorithm is the competitive ratio, which is the ratio of cost of the online algorithm to the optimal offline cost. [sent-262, score-0.461]
64 For all the results discussed below, we assume that the online algorithm can maintain a randomized state—a distribution over the metric—and pays the expected cost according to this random choice (Randomized algorithms tend to exhibit much better competitive ratios than deterministic algorithms). [sent-263, score-0.433]
65 For general metric spaces, Bartal et al achieved a competitive ratio of O(log6 n) [3], and this was improved to O(log2 n) by Fiat and Mendel [9]. [sent-267, score-0.235]
66 The latter two techniques, however, rely on a scheme of randomly approximating the metric space with a hierarchical tree metric, adding a (likely-unnecessary) multiplicative cost factor of log n. [sent-268, score-0.289]
67 It is widely believed that the minimax competitive ratio is O(log n) in general, but this gap has remained elusive for at least 10 years. [sent-269, score-0.279]
68 A weighted star is a metric such that each point i has a fixed distance di from some “center state”, and traveling between any state i and j requires 6 going through the center, hence incurring a switching cost of di + dj . [sent-271, score-0.338]
69 We can assume that the cost vector is of the form 0, . [sent-273, score-0.216]
70 When the online algorithm is currently maintaining a distribution w over the metric, and an infinite cost occurs at state i, we can assume1 that algorithm incurs exactly 2di wi , exactly the cost of having wi probability weight enter and leave i from the center. [sent-281, score-0.713]
71 With the methods developed in the present paper, however, we can give the minimax optimal online algorithm under the above simplifications. [sent-283, score-0.231]
72 Notice that the adversary is now choosing a sequence of states i1 , i2 , i3 . [sent-284, score-0.393]
73 , then the online algorithm’s job is to choose a sequence of distributions w(ρt ), and pays 2dit+1 wit+1 (ρt ) at each step. [sent-291, score-0.2]
74 In the end, the online algorithm’s cost is compared to the offline MTS cost of ρ, which we will call cost(ρ). [sent-292, score-0.48]
75 Assume2 we know the cost of the offline in advance, say it’s k, and let us define M = diag(2d1 , . [sent-293, score-0.245]
76 Notice the convenient trick here: by bounding a priori the cost of the offline at k, we can simply imagine playing this repeated game until the budget k is achieved. [sent-303, score-0.599]
77 In the simplest setting, a prediction market is a mechanism for selling n types of contract, where a contract of type i corresponds to some potential future outcome, say “event i will occur”. [sent-309, score-0.443]
78 When a bettor purchases a contract of type i, the manager of the market, or “market maker”, promises to pay out $1 if the outcome is i and $0 otherwise. [sent-311, score-0.324]
79 A popular research question in recent years is how to design such prediction markets when the outcome has a combinatorial structure. [sent-312, score-0.429]
80 An election might produce a complex outcome like a group of candidates winning, and a bettor may desire to bet on a complex predicate, such as “none of the winning candidates will be from my state”. [sent-313, score-0.344]
81 The computational aspects of combinatorial information markets are addressed in Chen et al [6], who provide a particular hardness result regarding computation of certain price functions, as well as a positive result for an alternative type of combinatorial market. [sent-315, score-0.49]
82 In the present section, we propose a new technique for designing combinatorial markets using the techniques laid out in the present work. [sent-316, score-0.282]
83 In this type of information market, the task of a market maker is to choose a price for each of the n contracts, but where the prices may be set adaptively according to the present demand. [sent-317, score-0.559]
84 2 Even when we do not know the offline cost in advance, standard “doubling tricks” allow you to guess this value and increase the guess as the game proceeds. [sent-321, score-0.395]
85 A typical example of such a cost function is C(s) = ∂s2 n i b log i=1 exp(si /b) where b is a parameter (see Chen and Pennock for further discussion [7]); it’s easy to check this function satisfies the desired properties. [sent-325, score-0.216]
86 This pricing scheme has the advantage that the total money earned in this market is easy to compute: it’s exactly C(s) regardless of the order in which the contracts were purchased. [sent-327, score-0.479]
87 A primary goal of an information market is to incentivize bettors to reveal their private knowledge of the outcome of an event. [sent-329, score-0.412]
88 If a given bettor believes the true distribution of the outcome to be q ∈ ∆n , he will have an incentive to purchase any contract i for which the current price pi is smaller than qi , thus providing positive expected reward (relative to his predicted distribution). [sent-330, score-0.418]
89 Using this cost-function scheme, it is possible that qi < C(s + ei ) − C(s) for all i and hence a bettor will have no incentive to bet. [sent-331, score-0.199]
90 We propose instead an alternative market mechanism that avoids this difficulty: for every given volume state s, the market maker will advertise a price vector w(s) ∈ ∆n . [sent-332, score-0.827]
91 If a contract of type i is purchased, the state proceeds to s + ei , and the market maker earns wi (s). [sent-333, score-0.706]
92 is purchased, the market maker’s total earning is t w(ei1 + . [sent-337, score-0.281]
93 On the other hand, if the final demand is s, in the worst case the market maker may have to payout a total of maxi si dollars. [sent-341, score-0.588]
94 This looks quite similar to the Budgeted Adversary reduction in the Hedge Setting described above, which is perhaps not too surprising given the strong connections discovered in Chen and Vaughn [8] between learning with experts and market design. [sent-343, score-0.341]
95 In this case, we can imagine a market maker selling a contract of type i with the following promise: if candidate i is in the winning subset, the payout is 1/m and 0 otherwise. [sent-347, score-0.687]
96 For similar reasons as above, the market maker should sell contracts at prices pi where i pi = 1. [sent-348, score-0.674]
97 If we assume that market maker has a budget constraint of k for the final payout, then we can handle this new setting within the Budgeted Adversary framework by simply modifying the cost function appropriately: cost(s) = max U ⊂[n],|U |=m i∈U si . [sent-349, score-0.912]
98 The benefit of our Budgeted Adversary framework is that we can handle arbitrary monotonic budget constraints, and the combinatorial nature of this problem can be encoded within the budget. [sent-351, score-0.339]
99 8 Open problem We have provided a very general framework for solving repeated zero-sum games against a budgeted adversary. [sent-353, score-0.483]
100 A new understanding of prediction markets via No-Regret learning. [sent-405, score-0.219]
wordName wordTfidf (topN-words)
[('adversary', 0.34), ('budgeted', 0.326), ('market', 0.281), ('hedge', 0.232), ('eit', 0.23), ('cost', 0.216), ('budget', 0.196), ('markets', 0.191), ('maker', 0.17), ('player', 0.164), ('minimax', 0.154), ('game', 0.15), ('metrical', 0.134), ('ine', 0.131), ('contracts', 0.13), ('games', 0.12), ('wit', 0.101), ('mts', 0.096), ('playout', 0.096), ('outcome', 0.093), ('combinatorial', 0.091), ('expert', 0.091), ('ci', 0.088), ('contract', 0.084), ('abernethy', 0.082), ('cit', 0.077), ('bettor', 0.077), ('metric', 0.073), ('al', 0.071), ('pay', 0.07), ('predicate', 0.069), ('bansal', 0.067), ('wi', 0.063), ('qi', 0.062), ('prices', 0.062), ('experts', 0.06), ('competitive', 0.059), ('minimaxloss', 0.057), ('pennock', 0.057), ('worstcaseloss', 0.057), ('wt', 0.056), ('sequence', 0.053), ('monotonic', 0.052), ('winning', 0.052), ('pays', 0.052), ('payout', 0.05), ('selling', 0.05), ('chen', 0.05), ('si', 0.049), ('state', 0.049), ('round', 0.048), ('online', 0.048), ('job', 0.047), ('election', 0.046), ('price', 0.046), ('notice', 0.042), ('maxi', 0.038), ('candidates', 0.038), ('bartal', 0.038), ('bettors', 0.038), ('fiat', 0.038), ('linial', 0.038), ('vaughn', 0.038), ('loss', 0.038), ('repeated', 0.037), ('diag', 0.037), ('elusive', 0.034), ('money', 0.034), ('purchased', 0.034), ('gelly', 0.034), ('cmax', 0.034), ('borodin', 0.034), ('ceases', 0.034), ('earned', 0.034), ('learner', 0.033), ('adversarial', 0.032), ('ratio', 0.032), ('ei', 0.031), ('sell', 0.031), ('freund', 0.031), ('cn', 0.031), ('know', 0.029), ('algorithm', 0.029), ('randomized', 0.029), ('incentive', 0.029), ('nonnegative', 0.029), ('strategy', 0.029), ('omit', 0.029), ('advance', 0.028), ('proceeds', 0.028), ('prediction', 0.028), ('stopping', 0.028), ('manfred', 0.027), ('purchase', 0.027), ('ln', 0.027), ('mini', 0.026), ('design', 0.026), ('page', 0.025), ('payoff', 0.025), ('rede', 0.025)]
simIndex simValue paperId paperTitle
same-paper 1 1.0000004 226 nips-2010-Repeated Games against Budgeted Adversaries
Author: Jacob D. Abernethy, Manfred K. Warmuth
Abstract: We study repeated zero-sum games against an adversary on a budget. Given that an adversary has some constraint on the sequence of actions that he plays, we consider what ought to be the player’s best mixed strategy with knowledge of this budget. We show that, for a general class of normal-form games, the minimax strategy is indeed efficiently computable and relies on a “random playout” technique. We give three diverse applications of this new algorithmic template: a cost-sensitive “Hedge” setting, a particular problem in Metrical Task Systems, and the design of combinatorial prediction markets. 1
2 0.16476743 39 nips-2010-Bayesian Action-Graph Games
Author: Albert X. Jiang, Kevin Leyton-brown
Abstract: Games of incomplete information, or Bayesian games, are an important gametheoretic model and have many applications in economics. We propose Bayesian action-graph games (BAGGs), a novel graphical representation for Bayesian games. BAGGs can represent arbitrary Bayesian games, and furthermore can compactly express Bayesian games exhibiting commonly encountered types of structure including symmetry, action- and type-specific utility independence, and probabilistic independence of type distributions. We provide an algorithm for computing expected utility in BAGGs, and discuss conditions under which the algorithm runs in polynomial time. Bayes-Nash equilibria of BAGGs can be computed by adapting existing algorithms for complete-information normal form games and leveraging our expected utility algorithm. We show both theoretically and empirically that our approaches improve significantly on the state of the art. 1
3 0.15789972 193 nips-2010-Online Learning: Random Averages, Combinatorial Parameters, and Learnability
Author: Alexander Rakhlin, Karthik Sridharan, Ambuj Tewari
Abstract: We develop a theory of online learning by defining several complexity measures. Among them are analogues of Rademacher complexity, covering numbers and fatshattering dimension from statistical learning theory. Relationship among these complexity measures, their connection to online learning, and tools for bounding them are provided. We apply these results to various learning problems. We provide a complete characterization of online learnability in the supervised setting. 1
4 0.098179914 222 nips-2010-Random Walk Approach to Regret Minimization
Author: Hariharan Narayanan, Alexander Rakhlin
Abstract: We propose a computationally efficient random walk on a convex body which rapidly mixes to a time-varying Gibbs distribution. In the setting of online convex optimization and repeated games, the algorithm yields low regret and presents a novel efficient method for implementing mixture forecasting strategies. 1
5 0.089565597 15 nips-2010-A Theory of Multiclass Boosting
Author: Indraneel Mukherjee, Robert E. Schapire
Abstract: Boosting combines weak classifiers to form highly accurate predictors. Although the case of binary classification is well understood, in the multiclass setting, the “correct” requirements on the weak classifier, or the notion of the most efficient boosting algorithms are missing. In this paper, we create a broad and general framework, within which we make precise and identify the optimal requirements on the weak-classifier, as well as design the most effective, in a certain sense, boosting algorithms that assume such requirements. 1
6 0.083513662 243 nips-2010-Smoothness, Low Noise and Fast Rates
7 0.080453157 185 nips-2010-Nonparametric Density Estimation for Stochastic Optimization with an Observable State Variable
8 0.07429345 192 nips-2010-Online Classification with Specificity Constraints
9 0.068641037 183 nips-2010-Non-Stochastic Bandit Slate Problems
10 0.066120729 260 nips-2010-Sufficient Conditions for Generating Group Level Sparsity in a Robust Minimax Framework
11 0.062531762 230 nips-2010-Robust Clustering as Ensembles of Affinity Relations
12 0.061978806 236 nips-2010-Semi-Supervised Learning with Adversarially Missing Label Information
13 0.060478449 43 nips-2010-Bootstrapping Apprenticeship Learning
14 0.058864452 27 nips-2010-Agnostic Active Learning Without Constraints
15 0.058467325 283 nips-2010-Variational Inference over Combinatorial Spaces
16 0.057744678 184 nips-2010-Nonparametric Bayesian Policy Priors for Reinforcement Learning
17 0.056310765 152 nips-2010-Learning from Logged Implicit Exploration Data
18 0.055964332 232 nips-2010-Sample Complexity of Testing the Manifold Hypothesis
19 0.053191192 212 nips-2010-Predictive State Temporal Difference Learning
20 0.051731482 104 nips-2010-Generative Local Metric Learning for Nearest Neighbor Classification
topicId topicWeight
[(0, 0.159), (1, -0.04), (2, 0.083), (3, 0.013), (4, -0.004), (5, 0.059), (6, -0.08), (7, -0.02), (8, -0.016), (9, 0.107), (10, 0.05), (11, -0.029), (12, -0.025), (13, -0.034), (14, 0.029), (15, 0.048), (16, -0.029), (17, 0.033), (18, -0.008), (19, -0.012), (20, 0.028), (21, 0.067), (22, 0.063), (23, -0.075), (24, -0.024), (25, -0.041), (26, 0.047), (27, 0.051), (28, -0.151), (29, 0.109), (30, 0.048), (31, 0.074), (32, 0.079), (33, -0.057), (34, 0.066), (35, -0.025), (36, -0.034), (37, -0.021), (38, 0.124), (39, -0.074), (40, 0.035), (41, -0.186), (42, 0.044), (43, -0.01), (44, 0.028), (45, -0.134), (46, 0.101), (47, 0.155), (48, -0.008), (49, -0.016)]
simIndex simValue paperId paperTitle
same-paper 1 0.92378813 226 nips-2010-Repeated Games against Budgeted Adversaries
Author: Jacob D. Abernethy, Manfred K. Warmuth
Abstract: We study repeated zero-sum games against an adversary on a budget. Given that an adversary has some constraint on the sequence of actions that he plays, we consider what ought to be the player’s best mixed strategy with knowledge of this budget. We show that, for a general class of normal-form games, the minimax strategy is indeed efficiently computable and relies on a “random playout” technique. We give three diverse applications of this new algorithmic template: a cost-sensitive “Hedge” setting, a particular problem in Metrical Task Systems, and the design of combinatorial prediction markets. 1
2 0.6671676 39 nips-2010-Bayesian Action-Graph Games
Author: Albert X. Jiang, Kevin Leyton-brown
Abstract: Games of incomplete information, or Bayesian games, are an important gametheoretic model and have many applications in economics. We propose Bayesian action-graph games (BAGGs), a novel graphical representation for Bayesian games. BAGGs can represent arbitrary Bayesian games, and furthermore can compactly express Bayesian games exhibiting commonly encountered types of structure including symmetry, action- and type-specific utility independence, and probabilistic independence of type distributions. We provide an algorithm for computing expected utility in BAGGs, and discuss conditions under which the algorithm runs in polynomial time. Bayes-Nash equilibria of BAGGs can be computed by adapting existing algorithms for complete-information normal form games and leveraging our expected utility algorithm. We show both theoretically and empirically that our approaches improve significantly on the state of the art. 1
3 0.56103128 193 nips-2010-Online Learning: Random Averages, Combinatorial Parameters, and Learnability
Author: Alexander Rakhlin, Karthik Sridharan, Ambuj Tewari
Abstract: We develop a theory of online learning by defining several complexity measures. Among them are analogues of Rademacher complexity, covering numbers and fatshattering dimension from statistical learning theory. Relationship among these complexity measures, their connection to online learning, and tools for bounding them are provided. We apply these results to various learning problems. We provide a complete characterization of online learnability in the supervised setting. 1
4 0.52872497 125 nips-2010-Inference and communication in the game of Password
Author: Yang Xu, Charles Kemp
Abstract: Communication between a speaker and hearer will be most efficient when both parties make accurate inferences about the other. We study inference and communication in a television game called Password, where speakers must convey secret words to hearers by providing one-word clues. Our working hypothesis is that human communication is relatively efficient, and we use game show data to examine three predictions. First, we predict that speakers and hearers are both considerate, and that both take the other’s perspective into account. Second, we predict that speakers and hearers are calibrated, and that both make accurate assumptions about the strategy used by the other. Finally, we predict that speakers and hearers are collaborative, and that they tend to share the cognitive burden of communication equally. We find evidence in support of all three predictions, and demonstrate in addition that efficient communication tends to break down when speakers and hearers are placed under time pressure.
5 0.46456653 15 nips-2010-A Theory of Multiclass Boosting
Author: Indraneel Mukherjee, Robert E. Schapire
Abstract: Boosting combines weak classifiers to form highly accurate predictors. Although the case of binary classification is well understood, in the multiclass setting, the “correct” requirements on the weak classifier, or the notion of the most efficient boosting algorithms are missing. In this paper, we create a broad and general framework, within which we make precise and identify the optimal requirements on the weak-classifier, as well as design the most effective, in a certain sense, boosting algorithms that assume such requirements. 1
6 0.42900768 202 nips-2010-Parallelized Stochastic Gradient Descent
7 0.42456335 236 nips-2010-Semi-Supervised Learning with Adversarially Missing Label Information
8 0.41732866 222 nips-2010-Random Walk Approach to Regret Minimization
9 0.39991921 243 nips-2010-Smoothness, Low Noise and Fast Rates
10 0.39253795 283 nips-2010-Variational Inference over Combinatorial Spaces
11 0.39226401 38 nips-2010-Batch Bayesian Optimization via Simulation Matching
12 0.38956195 35 nips-2010-Auto-Regressive HMM Inference with Incomplete Data for Short-Horizon Wind Forecasting
13 0.38383394 185 nips-2010-Nonparametric Density Estimation for Stochastic Optimization with an Observable State Variable
14 0.36104503 192 nips-2010-Online Classification with Specificity Constraints
15 0.35332412 274 nips-2010-Trading off Mistakes and Don't-Know Predictions
16 0.35232612 244 nips-2010-Sodium entry efficiency during action potentials: A novel single-parameter family of Hodgkin-Huxley models
17 0.34762588 64 nips-2010-Distributionally Robust Markov Decision Processes
18 0.34416458 171 nips-2010-Movement extraction by detecting dynamics switches and repetitions
19 0.34066337 183 nips-2010-Non-Stochastic Bandit Slate Problems
20 0.33504081 182 nips-2010-New Adaptive Algorithms for Online Classification
topicId topicWeight
[(13, 0.072), (27, 0.058), (30, 0.079), (45, 0.156), (50, 0.046), (52, 0.027), (53, 0.335), (60, 0.029), (77, 0.058), (78, 0.017), (90, 0.03)]
simIndex simValue paperId paperTitle
same-paper 1 0.74561602 226 nips-2010-Repeated Games against Budgeted Adversaries
Author: Jacob D. Abernethy, Manfred K. Warmuth
Abstract: We study repeated zero-sum games against an adversary on a budget. Given that an adversary has some constraint on the sequence of actions that he plays, we consider what ought to be the player’s best mixed strategy with knowledge of this budget. We show that, for a general class of normal-form games, the minimax strategy is indeed efficiently computable and relies on a “random playout” technique. We give three diverse applications of this new algorithmic template: a cost-sensitive “Hedge” setting, a particular problem in Metrical Task Systems, and the design of combinatorial prediction markets. 1
2 0.69135964 2 nips-2010-A Bayesian Approach to Concept Drift
Author: Stephen Bach, Mark Maloof
Abstract: To cope with concept drift, we placed a probability distribution over the location of the most-recent drift point. We used Bayesian model comparison to update this distribution from the predictions of models trained on blocks of consecutive observations and pruned potential drift points with low probability. We compare our approach to a non-probabilistic method for drift and a probabilistic method for change-point detection. In our experiments, our approach generally yielded improved accuracy and/or speed over these other methods. 1
3 0.65186018 90 nips-2010-Fast Large-scale Mixture Modeling with Component-specific Data Partitions
Author: Bo Thiesson, Chong Wang
Abstract: Remarkably easy implementation and guaranteed convergence has made the EM algorithm one of the most used algorithms for mixture modeling. On the downside, the E-step is linear in both the sample size and the number of mixture components, making it impractical for large-scale data. Based on the variational EM framework, we propose a fast alternative that uses component-specific data partitions to obtain a sub-linear E-step in sample size, while the algorithm still maintains provable convergence. Our approach builds on previous work, but is significantly faster and scales much better in the number of mixture components. We demonstrate this speedup by experiments on large-scale synthetic and real data. 1
4 0.63757688 208 nips-2010-Policy gradients in linearly-solvable MDPs
Author: Emanuel Todorov
Abstract: We present policy gradient results within the framework of linearly-solvable MDPs. For the first time, compatible function approximators and natural policy gradients are obtained by estimating the cost-to-go function, rather than the (much larger) state-action advantage function as is necessary in traditional MDPs. We also develop the first compatible function approximators and natural policy gradients for continuous-time stochastic systems.
5 0.63431454 163 nips-2010-Lower Bounds on Rate of Convergence of Cutting Plane Methods
Author: Xinhua Zhang, Ankan Saha, S.v.n. Vishwanathan
Abstract: In a recent paper Joachims [1] presented SVM-Perf, a cutting plane method (CPM) for training linear Support Vector Machines (SVMs) which converges to an accurate solution in O(1/ 2 ) iterations. By tightening the analysis, Teo et al. [2] showed that O(1/ ) iterations suffice. Given the impressive convergence speed of CPM on a number of practical problems, it was conjectured that these rates could be further improved. In this paper we disprove this conjecture. We present counter examples which are not only applicable for training linear SVMs with hinge loss, but also hold for support vector methods which optimize a multivariate performance score. However, surprisingly, these problems are not inherently hard. By exploiting the structure of the objective function we can devise an algo√ rithm that converges in O(1/ ) iterations. 1
6 0.53378552 196 nips-2010-Online Markov Decision Processes under Bandit Feedback
7 0.52995396 179 nips-2010-Natural Policy Gradient Methods with Parameter-based Exploration for Control Tasks
8 0.52924615 180 nips-2010-Near-Optimal Bayesian Active Learning with Noisy Observations
9 0.52885181 183 nips-2010-Non-Stochastic Bandit Slate Problems
10 0.52554512 117 nips-2010-Identifying graph-structured activation patterns in networks
11 0.52538842 222 nips-2010-Random Walk Approach to Regret Minimization
12 0.52285486 64 nips-2010-Distributionally Robust Markov Decision Processes
13 0.52155423 4 nips-2010-A Computational Decision Theory for Interactive Assistants
14 0.52130014 238 nips-2010-Short-term memory in neuronal networks through dynamical compressed sensing
15 0.52116621 38 nips-2010-Batch Bayesian Optimization via Simulation Matching
16 0.51762515 160 nips-2010-Linear Complementarity for Regularized Policy Evaluation and Improvement
17 0.51633185 265 nips-2010-The LASSO risk: asymptotic results and real world examples
18 0.51564145 78 nips-2010-Error Propagation for Approximate Policy and Value Iteration
19 0.51563036 193 nips-2010-Online Learning: Random Averages, Combinatorial Parameters, and Learnability