nips nips2011 nips2011-218 knowledge-graph by maker-knowledge-mining

218 nips-2011-Predicting Dynamic Difficulty


Source: pdf

Author: Olana Missura, Thomas Gärtner

Abstract: Motivated by applications in electronic games as well as teaching systems, we investigate the problem of dynamic difficulty adjustment. The task here is to repeatedly find a game difficulty setting that is neither ‘too easy’ and bores the player, nor ‘too difficult’ and overburdens the player. The contributions of this paper are (i) the formulation of difficulty adjustment as an online learning problem on partially ordered sets, (ii) an exponential update algorithm for dynamic difficulty adjustment, (iii) a bound on the number of wrong difficulty settings relative to the best static setting chosen in hindsight, and (iv) an empirical investigation of the algorithm when playing against adversaries. 1

Reference: text


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 de Abstract Motivated by applications in electronic games as well as teaching systems, we investigate the problem of dynamic difficulty adjustment. [sent-4, score-0.333]

2 The task here is to repeatedly find a game difficulty setting that is neither ‘too easy’ and bores the player, nor ‘too difficult’ and overburdens the player. [sent-5, score-0.252]

3 1 Introduction While difficulty adjustment is common practise in many traditional games (consider, for instance, the handicap in golf or the handicap stones in go), the case for dynamic difficulty adjustment in electronic games has been made only recently [7]. [sent-7, score-1.064]

4 In this paper, we formalise dynamic difficulty adjustment as a game between a master and a player in which the master tries to predict the most appropriate difficulty setting. [sent-9, score-1.055]

5 As the player is typically a human with changing performance depending on many hidden factors as well as luck, no assumptions about the player can be made. [sent-10, score-0.593]

6 The difficulty adjustment game is played on a partially ordered set which reflects the ‘more difficult than’-relation on the set of difficulty settings. [sent-11, score-0.578]

7 To the best of our knowledge, in this paper, we provide the first thorough theoretical treatment of dynamic difficulty adjustment as a prediction problem. [sent-12, score-0.348]

8 The contributions of this paper are: We formalise the learning problem of dynamic difficulty adjustment (in Section 2), propose a novel learning algorithm for this problem (in Section 4), and give a bound on the number of proposed difficulty settings that were not just right (in Section 5). [sent-13, score-0.473]

9 The bound limits the number of mistakes the algorithm can make relative to the best static difficulty setting chosen in hindsight. [sent-14, score-0.294]

10 In particular, we investigate the performance of the algorithm ‘against’ statistically distributed players by simulating the players as well as ‘against’ adversaries by asking humans to try to trick the algorithm in a simplified setting. [sent-17, score-0.306]

11 Implementing our algorithm into a real game and testing it on real human players is left to future work. [sent-18, score-0.313]

12 2 Formalisation To be able to theoretically investigate dynamic difficulty adjustment, we view it as a game between a master and a player, played on a partially ordered set modelling the ‘more difficult than’-relation. [sent-19, score-0.57]

13 The game is played in turns where each turn has the following elements: 1 1. [sent-20, score-0.232]

14 the player plays one ‘round’ of the game in this setting, and 3. [sent-22, score-0.457]

15 the game master experiences whether the setting was ‘too difficult’, ‘just right’, or ‘too easy’ for the player. [sent-23, score-0.369]

16 The master aims at making as few as possible mistakes, that is, at choosing a difficulty setting that is ‘just right’ as often as possible. [sent-24, score-0.185]

17 In this paper, we aim at developing an algorithm for the master with theoretical guarantees on the number of mistakes in the worst case while not making any assumptions about the player. [sent-25, score-0.195]

18 Even with these natural assumptions, in the worst case, no algorithm for the master will be able to make even a single correct prediction. [sent-27, score-0.117]

19 As we can not make any assumptions about the player, we will be interested in comparing our algorithm theoretically and empirically with the best statically chosen difficulty setting, as is commonly the case in online learning [3]. [sent-28, score-0.148]

20 3 Related Work As of today there exist a few commercial games with a well designed dynamic difficulty adjustment mechanism, but all of them employ heuristics and as such suffer from the typical disadvantages (being not transferable easily to other games, requiring extensive testing, etc). [sent-29, score-0.544]

21 What we would like to have instead of heuristics is a universal mechanism for dynamic difficulty adjustment: An online algorithm that takes as an input (game-specific) ways to modify difficulty and the current player’s in-game history (actions, performance, reactions, . [sent-30, score-0.116]

22 Both artificial intelligence researchers and the game developers community display an interest in the problem of automatic difficulty scaling. [sent-34, score-0.184]

23 Since the perceived difficulty and the preferred difficulty are subjective parameters, the dynamic difficulty adjustment algorithm should be able to choose the “right” difficulty level in a comparatively short time for any particular player. [sent-40, score-0.326]

24 Existing work in player modeling in computer games [14, 13, 5, 12] demonstrates the power of utilising the player models to create the games or in-game situations of high interest and satisfaction for the players. [sent-41, score-1.011]

25 As can be seen from these examples the problem of dynamic difficulty adjustment in video games was attacked from different angles, but a unifying and theoretically sound approach is still missing. [sent-42, score-0.565]

26 To the best of our knowledge this work contains the first theoretical formalization of dynamic difficulty adjustment as a learning problem. [sent-43, score-0.348]

27 , the learning algorithm, does not see the colours but must point at a green vertex as often as 2 possible. [sent-48, score-0.208]

28 This setting is related to learning directed cuts with membership queries. [sent-50, score-0.147]

29 They furthermore showed that directed cuts are not learnable with traditional membership queries if the labelling is allowed to change over time. [sent-56, score-0.121]

30 This negative result also does not apply to our case as the aim of the master is “only” to point at a green vertex as often as possible and as we are interested in a comparison with the best static vertex chosen in hindsight. [sent-57, score-0.566]

31 If we ignore the structure inherent in the difficulty settings, we will be in a standard multi-armed bandit setting [2]: There are K arms, to which an unknown adversary assigns loss values on each iteration (0 to the ‘just right’ arms, 1 to all the others). [sent-58, score-0.278]

32 The standard performance measure is the so-called ‘regret’: The difference of the loss acquired by the learning algorithm and by the best static arm chosen in hindsight. [sent-62, score-0.21]

33 The upper bound on its regret is of the order KT ln(T ), where T is the amount of iterations. [sent-64, score-0.132]

34 I MPROVED PI will be the second baseline after the best static in hindsight (B SIH) in our experiments. [sent-65, score-0.175]

35 4 Algorithm In this section we give an exponential update algorithm for predicting a vertex that corresponds to a ‘just right’ difficulty setting in a finite partially ordered set (K, ) of difficulty settings. [sent-66, score-0.346]

36 The partial order is such that for i, j ∈ K we write i j if difficulty setting i is ‘more difficult than’ difficulty setting j. [sent-67, score-0.136]

37 The response that the master algorithm can observe ot is +1 if the chosen difficulty setting was ‘too easy’, 0 if it was ‘just right’, and −1 if it was ‘too difficult’. [sent-69, score-0.252]

38 The algorithm maintains a belief w of each vertex being ‘just right’ and updates this belief if the observed response implies that the setting was ‘too easy’ or ‘too difficult’. [sent-70, score-0.361]

39 To ensure it, we compute for each setting k the belief ‘above’ k as well as ‘below’ k . [sent-79, score-0.146]

40 3 That is, At in line 3 of the algorithm collects the belief of all settings that are known to be ‘more difficult’ and Bt in line 4 of the algorithm collects the belief of all settings that are known to be ‘less difficult’ than k. [sent-80, score-0.334]

41 If we observe that the proposed setting was ‘too easy’, that is, we should ‘increase the difficulty’, in line 8 we update the belief of the proposed setting as well as all settings easier than the proposed. [sent-81, score-0.304]

42 If we observe that the proposed setting was ‘too difficult’, that is, we should ‘decrease the difficulty’, in line 11 we update the belief of the proposed setting as well as all settings more difficult than the proposed. [sent-82, score-0.304]

43 The amount of belief that is updated for each mistake is thus equal to Bt (kt ) or At (kt ). [sent-83, score-0.174]

44 5 Theory We will now show a bound on the number of inappropriate difficulty settings that are proposed, relative to the number of mistakes the best static difficulty setting makes. [sent-85, score-0.354]

45 We denote the number of mistakes of P OSM until time T by m and the minimum number of times a statically chosen difficulty setting would have made a mistake until time T by M . [sent-86, score-0.252]

46 We denote furthermore the total amount of belief on the partially ordered set by Wt = k∈K wt (k). [sent-87, score-0.366]

47 m≤ 2|C| ln 2|C|−1+β For all c ∈ C we denote the amount of belief on every chain by Wtc = x∈c wt (x), the bec lief ‘above’ k on c by Ac (k) = wt (x), and the belief ‘below’ k on c by Bt (k) = t x∈c:x k c x∈c:x k wt (x). [sent-98, score-0.744]

48 To relate the amount of belief updated by P OSM to the amount of belief on each chain observe that max min{At (k), Bt (k)} = max max min{At (k), Bt (k)} c∈C k∈K k∈c c ≥ max max min{Ac (k), Bt (k)} t c∈C k∈c c ≥ max min{Act (k), Bt t (k)} . [sent-103, score-0.271]

49 c∈C Wtc ≥ WT , it holds that We will next show that for every chain, there is a difficulty setting for which it holds that: If we proposed that setting and made a mistake, we would be able to update at least half of the total weight of that chain. [sent-107, score-0.167]

50 Such i, j exist and are unique as ∀x ∈ K : wt (x) > 0. [sent-114, score-0.157]

51 As we only update the weight of a difficulty setting if the response implied that the algorithm made a mistake, β M is a lower bound on the weight of one difficulty setting and hence also WT ≥ β M . [sent-123, score-0.194]

52 Note, that this bound is similar to the bound for the full information setting [3] despite much weaker information being available in our case. [sent-125, score-0.122]

53 The influence of |C| is the new ingredient that changes the behaviour of this bound for different partially ordered sets. [sent-126, score-0.188]

54 6 Experiments We performed two sets of experiments: simulating a game against a stochastic environment, as well as using human players to provide our algorithm with a non-oblivious adversary. [sent-127, score-0.392]

55 The first one is the best static difficulty setting in hindsight: it is a difficulty that a player would pick if she knew her skill level in advance and had to choose the difficulty only once. [sent-129, score-0.497]

56 In the ‘non-smooth’ setting we don’t place any restrictions on it apart from its size, while in the ‘smooth’ setting the border of the zero-zone is allowed to move only by one vertex at a time. [sent-134, score-0.341]

57 These two settings represent two extreme situations: one player changing her skills gradually with time is changing the zero-zone ‘smoothly’; different players with different skills for each new challenge the game presents will make the zero-zone ‘jump’. [sent-135, score-0.757]

58 5 ImprovedPI POSM BSIH 400 ImprovedPI POSM 300 250 350 200 300 150 regret loss 250 200 100 50 150 0 100 -50 50 -100 0 0 100 200 300 400 500 0 100 200 time 300 400 500 time (a) Loss. [sent-137, score-0.12]

59 ImprovedPI POSM BSIH 350 200 250 150 200 100 regret 300 loss ImprovedPI POSM 250 150 50 100 0 50 -50 -100 0 0 100 200 300 400 500 0 time 100 200 300 400 500 time (a) Loss. [sent-140, score-0.12]

60 1 Stochastic Adversary In the first set of experiments we performed, the adversary is stochastic: On every iteration the zerozone changes with a pre-defined probability. [sent-144, score-0.162]

61 In the ‘smooth’ setting only one of the border vertices of the zero-zone at a time can change its label. [sent-145, score-0.162]

62 For the ‘non-smooth’ setting we consider a truly evil case of limiting the zero-zone to always containing only one vertex and a case where the zero-zone may contain up to 20% of all the vertices in the graph. [sent-146, score-0.355]

63 Note that even relabeling of a single vertex may break the consistency of the labeling with regard to the poset. [sent-147, score-0.137]

64 The necessary repair procedure may result in more than one vertex being relabeled at a time. [sent-148, score-0.137]

65 We consider two graphs that represent two different but typical games structures with regard to the difficulty: a single chain and a 2-dimensional grid. [sent-149, score-0.291]

66 A set of progressively more difficult challenges such that can be found in a puzzle or a time-management game can be directly mapped onto a chain of a length corresponding to the amount of challenges. [sent-150, score-0.278]

67 A 2- (or more-) dimensional grid on the other hand is more like a skill-based game, where depending on the choices players make different game states become available to them. [sent-151, score-0.313]

68 In all considered variations of the setting the game lasts for 500 iterations and is repeated 10 times. [sent-153, score-0.252]

69 The resulting mean and standard deviation values of loss and regret, respectively, are shown in the following figures: The ‘smooth’ setting in Figures 1(a), 1(b) and 2(a), 2(b); The ‘non-smooth’ setting in Figures 3(a), 3(b) and 4(a), 4(b). [sent-154, score-0.172]

70 ) Note that in the ‘smooth’ setting P OSM is outperforming B SIH and, therefore, its regret is negative. [sent-157, score-0.152]

71 6 12 ImprovedPI POSM BSIH 450 ImprovedPI POSM 10 400 8 350 6 regret loss 300 250 200 4 2 150 0 100 -2 50 -4 0 0 100 200 300 400 500 0 100 200 time 300 400 500 time (a) Loss. [sent-162, score-0.12]

72 Figure 3: Stochastic adversary, ‘non-smooth’ setting, exactly one vertex in the zero-zone, on a single chain of 50 vertices. [sent-164, score-0.21]

73 ImprovedPI POSM BSIH 400 ImprovedPI POSM 60 350 50 300 40 regret loss 250 200 30 20 150 10 100 0 50 -10 0 0 100 200 300 400 500 0 time 100 200 300 400 500 time (a) Loss. [sent-165, score-0.12]

74 Figure 4: Stochastic adversary, ‘non-smooth’ setting, up to 20% of all vertices may be in the zerozone, on a single chain of 50 vertices. [sent-167, score-0.14]

75 2 Evil Adversary While the experiments in our stochastic environment show encouraging results, of real interest to us is the situation where the adversary is ‘evil’, non-stochastic, and furthermore, non-oblivious. [sent-169, score-0.155]

76 In dynamic difficulty adjustment the algorithm will have to deal with people, who are learning and changing in hard to predict ways. [sent-170, score-0.349]

77 Even though it is a simplified scenario, this situation is rather natural for games and it demonstrates the power of our algorithm. [sent-172, score-0.218]

78 Just as in dynamic difficulty adjustment players are not supposed to be aware of the mechanics, our methods and goals were not disclosed to the testing persons. [sent-174, score-0.455]

79 Instead they were presented with a modified game of cups: On every iteration the casino is hiding a coin under one of the cups; after that the player can point at two of the cups. [sent-175, score-0.538]

80 If the coin is under one of these two, the player wins it. [sent-176, score-0.321]

81 Behind the scenes the cups represented the vertices on the chain and the players’ choices were setting the lower and upper borders of the zero-zone. [sent-177, score-0.283]

82 If the algorithm’s prediction was wrong, one of the two cups was decided on randomly and the coin was placed under it. [sent-178, score-0.146]

83 In a simplified setting as this and without any extrinsic rewards they can only handle short chains and short games before getting bored. [sent-181, score-0.306]

84 In our case we restricted the length of the chain to 8 and the length of each game to 15. [sent-182, score-0.257]

85 It is possible to simulate a longer game by not resetting the weights of the algorithm after each game is over, but at the current stage of work it wasn’t done. [sent-183, score-0.368]

86 Again, we created the ‘smooth’ and ‘non-smooth’ setting by placing or removing restrictions on how players were allowed to choose their cups. [sent-184, score-0.238]

87 To each game either I MPROVED PI or P OSM was assigned. [sent-185, score-0.184]

88 Note, that due to the fact that this time different games were played by I MPROVED PI and P OSM, we have two different plots for their corresponding loss values. [sent-187, score-0.302]

89 7 ImprovedPI Best Static 14 POSM Best Static 14 ImprovedPI POSM 10 12 12 10 10 8 8 regret loss loss 8 6 4 4 2 2 0 0 6 6 4 2 0 2 4 6 8 10 12 14 0 0 2 4 6 time 8 10 12 14 0 2 4 6 time (a) Games vs I MPROVED PI. [sent-188, score-0.189]

90 ImprovedPI Best Static 12 POSM Best Static 12 ImprovedPI POSM 12 10 10 10 8 8 regret loss loss 8 6 4 4 2 2 0 0 6 6 4 0 2 4 6 8 10 12 14 time (a) Games vs I MPROVED PI. [sent-192, score-0.189]

91 Note, that the loss of B SIH appears to be worse in games played by P OSM. [sent-198, score-0.302]

92 A plausible interpretation is that players had to follow more difficult (less static) strategies to fool P OSM to win their coins. [sent-199, score-0.129]

93 7 Conclusions In this paper we formalised dynamic difficulty adjustment as a prediction problem on partially ordered sets and proposed a novel online learning algorithm, P OSM, for dynamic difficulty adjustment. [sent-201, score-0.552]

94 Using this formalisation, we were able to prove a bound on the performance of P OSM relative to the best static difficulty setting chosen in hindsight, B SIH. [sent-202, score-0.24]

95 As this is also even better than the behaviour suggested by our mistake bound, there seems to be a gap between the theoretical and empirical performance of our algorithm. [sent-205, score-0.126]

96 On the other hand, we will implement P OSM in a range of computer games as well as teaching systems to observe its behaviour in real application scenarios. [sent-207, score-0.294]

97 Dynamic player modeling: A framework for player-centered digital games. [sent-239, score-0.273]

98 a Online adaptation of computer games agents: A reinforcement learning approach. [sent-262, score-0.218]

99 Online geometric optimization in the bandit setting against an adaptive adversary. [sent-284, score-0.118]

100 Making racing fun through player modeling and track evolution. [sent-299, score-0.273]


similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('osm', 0.358), ('culty', 0.357), ('player', 0.273), ('adjustment', 0.236), ('improvedpi', 0.226), ('posm', 0.226), ('games', 0.218), ('game', 0.184), ('bt', 0.181), ('wtc', 0.169), ('dif', 0.164), ('wt', 0.157), ('sih', 0.151), ('vertex', 0.137), ('mproved', 0.132), ('players', 0.129), ('adversary', 0.124), ('master', 0.117), ('static', 0.101), ('dynamic', 0.09), ('regret', 0.084), ('evil', 0.083), ('ac', 0.079), ('belief', 0.078), ('bsih', 0.075), ('cups', 0.075), ('mistake', 0.075), ('chain', 0.073), ('setting', 0.068), ('vertices', 0.067), ('ordered', 0.067), ('kt', 0.059), ('settings', 0.059), ('mistakes', 0.054), ('smooth', 0.053), ('cult', 0.052), ('hindsight', 0.052), ('directed', 0.051), ('behaviour', 0.051), ('bandit', 0.05), ('coin', 0.048), ('simulating', 0.048), ('played', 0.048), ('rtner', 0.046), ('ot', 0.045), ('partially', 0.043), ('path', 0.043), ('colours', 0.041), ('danzi', 0.038), ('formalisation', 0.038), ('formalise', 0.038), ('heaviest', 0.038), ('hunicke', 0.038), ('missura', 0.038), ('wtct', 0.038), ('zerozone', 0.038), ('round', 0.037), ('loss', 0.036), ('pi', 0.034), ('skill', 0.033), ('skills', 0.033), ('handicap', 0.033), ('statically', 0.033), ('casino', 0.033), ('vs', 0.033), ('ct', 0.032), ('paths', 0.031), ('update', 0.031), ('figures', 0.031), ('stochastic', 0.031), ('collects', 0.03), ('green', 0.03), ('satisfaction', 0.029), ('arm', 0.029), ('cuts', 0.028), ('bound', 0.027), ('border', 0.027), ('online', 0.026), ('easy', 0.026), ('teaching', 0.025), ('labelled', 0.025), ('assumptions', 0.024), ('feedback', 0.023), ('decided', 0.023), ('inappropriate', 0.023), ('changing', 0.023), ('ln', 0.023), ('right', 0.023), ('chosen', 0.022), ('min', 0.022), ('best', 0.022), ('arms', 0.022), ('labelling', 0.022), ('people', 0.021), ('restrictions', 0.021), ('amount', 0.021), ('theoretically', 0.021), ('allowed', 0.02), ('chains', 0.02), ('monotone', 0.02)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 1.0 218 nips-2011-Predicting Dynamic Difficulty

Author: Olana Missura, Thomas Gärtner

Abstract: Motivated by applications in electronic games as well as teaching systems, we investigate the problem of dynamic difficulty adjustment. The task here is to repeatedly find a game difficulty setting that is neither ‘too easy’ and bores the player, nor ‘too difficult’ and overburdens the player. The contributions of this paper are (i) the formulation of difficulty adjustment as an online learning problem on partially ordered sets, (ii) an exponential update algorithm for dynamic difficulty adjustment, (iii) a bound on the number of wrong difficulty settings relative to the best static setting chosen in hindsight, and (iv) an empirical investigation of the algorithm when playing against adversaries. 1

2 0.23407438 196 nips-2011-On Strategy Stitching in Large Extensive Form Multiplayer Games

Author: Richard G. Gibson, Duane Szafron

Abstract: Computing a good strategy in a large extensive form game often demands an extraordinary amount of computer memory, necessitating the use of abstraction to reduce the game size. Typically, strategies from abstract games perform better in the real game as the granularity of abstraction is increased. This paper investigates two techniques for stitching a base strategy in a coarse abstraction of the full game tree, to expert strategies in fine abstractions of smaller subtrees. We provide a general framework for creating static experts, an approach that generalizes some previous strategy stitching efforts. In addition, we show that static experts can create strong agents for both 2-player and 3-player Leduc and Limit Texas Hold’em poker, and that a specific class of static experts can be preferred among a number of alternatives. Furthermore, we describe a poker agent that used static experts and won the 3-player events of the 2010 Annual Computer Poker Competition.

3 0.15774333 204 nips-2011-Online Learning: Stochastic, Constrained, and Smoothed Adversaries

Author: Alexander Rakhlin, Karthik Sridharan, Ambuj Tewari

Abstract: Learning theory has largely focused on two main learning scenarios: the classical statistical setting where instances are drawn i.i.d. from a fixed distribution, and the adversarial scenario wherein, at every time step, an adversarially chosen instance is revealed to the player. It can be argued that in the real world neither of these assumptions is reasonable. We define the minimax value of a game where the adversary is restricted in his moves, capturing stochastic and non-stochastic assumptions on data. Building on the sequential symmetrization approach, we define a notion of distribution-dependent Rademacher complexity for the spectrum of problems ranging from i.i.d. to worst-case. The bounds let us immediately deduce variation-type bounds. We study a smoothed online learning scenario and show that exponentially small amount of noise can make function classes with infinite Littlestone dimension learnable. 1

4 0.14099026 220 nips-2011-Prediction strategies without loss

Author: Michael Kapralov, Rina Panigrahy

Abstract: Consider a sequence of bits where we are trying to predict the next bit from the previous bits. Assume we are allowed to say ‘predict 0’ or ‘predict 1’, and our payoff is +1 if the prediction is correct and −1 otherwise. We will say that at each point in time the loss of an algorithm is the number of wrong predictions minus the number of right predictions so far. In this paper we are interested in algorithms that have essentially zero (expected) loss over any string at any point in time and yet have small regret with respect to always predicting 0 or always predicting 1. For a sequence of length T our algorithm has regret 14 T and loss √ 2 2 T e− T in expectation for all strings. We show that the tradeoff between loss and regret is optimal up to constant factors. Our techniques extend to the general setting of N experts, where the related problem of trading off regret to the best expert for regret to the ’special’ expert has been studied by Even-Dar et al. (COLT’07). We obtain essentially zero loss with respect to the special expert and optimal loss/regret tradeoff, improving upon the results of Even-Dar et al and settling the main question left open in their paper. The strong loss bounds of the algorithm have some surprising consequences. First, we obtain a parameter free algorithm for the experts problem that has optimal regret bounds with respect to k-shifting optima, i.e. bounds with respect to the optimum that is allowed to change arms multiple times. Moreover, for any window of size n the regret of our algorithm to any expert never exceeds O( n(log N + log T )), where N is the number of experts and T is the time horizon, while maintaining the essentially zero loss property. 1

5 0.106226 25 nips-2011-Adaptive Hedge

Author: Tim V. Erven, Wouter M. Koolen, Steven D. Rooij, Peter Grünwald

Abstract: Most methods for decision-theoretic online learning are based on the Hedge algorithm, which takes a parameter called the learning rate. In most previous analyses the learning rate was carefully tuned to obtain optimal worst-case performance, leading to suboptimal performance on easy instances, for example when there exists an action that is significantly better than all others. We propose a new way of setting the learning rate, which adapts to the difficulty of the learning problem: in the worst case our procedure still guarantees optimal performance, but on easy instances it achieves much smaller regret. In particular, our adaptive method achieves constant regret in a probabilistic setting, when there exists an action that on average obtains strictly smaller loss than all other actions. We also provide a simulation study comparing our approach to existing methods. 1

6 0.10013611 185 nips-2011-Newtron: an Efficient Bandit algorithm for Online Multiclass Prediction

7 0.094233319 303 nips-2011-Video Annotation and Tracking with Active Learning

8 0.082343489 128 nips-2011-Improved Algorithms for Linear Stochastic Bandits

9 0.080536224 210 nips-2011-PAC-Bayesian Analysis of Contextual Bandits

10 0.078401029 245 nips-2011-Selecting the State-Representation in Reinforcement Learning

11 0.077462666 145 nips-2011-Learning Eigenvectors for Free

12 0.071662985 202 nips-2011-On the Universality of Online Mirror Descent

13 0.07114321 80 nips-2011-Efficient Online Learning via Randomized Rounding

14 0.068807684 45 nips-2011-Beating SGD: Learning SVMs in Sublinear Time

15 0.067506313 300 nips-2011-Variance Reduction in Monte-Carlo Tree Search

16 0.065100923 98 nips-2011-From Bandits to Experts: On the Value of Side-Observations

17 0.063150756 136 nips-2011-Inverting Grice's Maxims to Learn Rules from Natural Language Extractions

18 0.060959253 197 nips-2011-On Tracking The Partition Function

19 0.060303636 72 nips-2011-Distributed Delayed Stochastic Optimization

20 0.059449609 205 nips-2011-Online Submodular Set Cover, Ranking, and Repeated Active Learning


similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.138), (1, -0.166), (2, -0.004), (3, 0.014), (4, 0.131), (5, -0.024), (6, 0.002), (7, 0.007), (8, 0.01), (9, -0.031), (10, 0.047), (11, -0.055), (12, -0.02), (13, -0.024), (14, 0.036), (15, 0.0), (16, 0.015), (17, 0.063), (18, 0.05), (19, 0.027), (20, 0.088), (21, -0.034), (22, -0.079), (23, -0.022), (24, 0.025), (25, -0.007), (26, -0.162), (27, -0.042), (28, -0.035), (29, -0.008), (30, -0.082), (31, -0.087), (32, -0.089), (33, 0.092), (34, 0.109), (35, -0.038), (36, 0.181), (37, -0.126), (38, 0.037), (39, 0.006), (40, -0.244), (41, 0.17), (42, -0.142), (43, -0.006), (44, -0.08), (45, -0.004), (46, -0.001), (47, 0.043), (48, -0.136), (49, -0.024)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.96480787 218 nips-2011-Predicting Dynamic Difficulty

Author: Olana Missura, Thomas Gärtner

Abstract: Motivated by applications in electronic games as well as teaching systems, we investigate the problem of dynamic difficulty adjustment. The task here is to repeatedly find a game difficulty setting that is neither ‘too easy’ and bores the player, nor ‘too difficult’ and overburdens the player. The contributions of this paper are (i) the formulation of difficulty adjustment as an online learning problem on partially ordered sets, (ii) an exponential update algorithm for dynamic difficulty adjustment, (iii) a bound on the number of wrong difficulty settings relative to the best static setting chosen in hindsight, and (iv) an empirical investigation of the algorithm when playing against adversaries. 1

2 0.91681367 196 nips-2011-On Strategy Stitching in Large Extensive Form Multiplayer Games

Author: Richard G. Gibson, Duane Szafron

Abstract: Computing a good strategy in a large extensive form game often demands an extraordinary amount of computer memory, necessitating the use of abstraction to reduce the game size. Typically, strategies from abstract games perform better in the real game as the granularity of abstraction is increased. This paper investigates two techniques for stitching a base strategy in a coarse abstraction of the full game tree, to expert strategies in fine abstractions of smaller subtrees. We provide a general framework for creating static experts, an approach that generalizes some previous strategy stitching efforts. In addition, we show that static experts can create strong agents for both 2-player and 3-player Leduc and Limit Texas Hold’em poker, and that a specific class of static experts can be preferred among a number of alternatives. Furthermore, we describe a poker agent that used static experts and won the 3-player events of the 2010 Annual Computer Poker Competition.

3 0.63482434 136 nips-2011-Inverting Grice's Maxims to Learn Rules from Natural Language Extractions

Author: Mohammad S. Sorower, Janardhan R. Doppa, Walker Orr, Prasad Tadepalli, Thomas G. Dietterich, Xiaoli Z. Fern

Abstract: We consider the problem of learning rules from natural language text sources. These sources, such as news articles and web texts, are created by a writer to communicate information to a reader, where the writer and reader share substantial domain knowledge. Consequently, the texts tend to be concise and mention the minimum information necessary for the reader to draw the correct conclusions. We study the problem of learning domain knowledge from such concise texts, which is an instance of the general problem of learning in the presence of missing data. However, unlike standard approaches to missing data, in this setting we know that facts are more likely to be missing from the text in cases where the reader can infer them from the facts that are mentioned combined with the domain knowledge. Hence, we can explicitly model this “missingness” process and invert it via probabilistic inference to learn the underlying domain knowledge. This paper introduces a mention model that models the probability of facts being mentioned in the text based on what other facts have already been mentioned and domain knowledge in the form of Horn clause rules. Learning must simultaneously search the space of rules and learn the parameters of the mention model. We accomplish this via an application of Expectation Maximization within a Markov Logic framework. An experimental evaluation on synthetic and natural text data shows that the method can learn accurate rules and apply them to new texts to make correct inferences. Experiments also show that the method out-performs the standard EM approach that assumes mentions are missing at random. 1

4 0.46916726 220 nips-2011-Prediction strategies without loss

Author: Michael Kapralov, Rina Panigrahy

Abstract: Consider a sequence of bits where we are trying to predict the next bit from the previous bits. Assume we are allowed to say ‘predict 0’ or ‘predict 1’, and our payoff is +1 if the prediction is correct and −1 otherwise. We will say that at each point in time the loss of an algorithm is the number of wrong predictions minus the number of right predictions so far. In this paper we are interested in algorithms that have essentially zero (expected) loss over any string at any point in time and yet have small regret with respect to always predicting 0 or always predicting 1. For a sequence of length T our algorithm has regret 14 T and loss √ 2 2 T e− T in expectation for all strings. We show that the tradeoff between loss and regret is optimal up to constant factors. Our techniques extend to the general setting of N experts, where the related problem of trading off regret to the best expert for regret to the ’special’ expert has been studied by Even-Dar et al. (COLT’07). We obtain essentially zero loss with respect to the special expert and optimal loss/regret tradeoff, improving upon the results of Even-Dar et al and settling the main question left open in their paper. The strong loss bounds of the algorithm have some surprising consequences. First, we obtain a parameter free algorithm for the experts problem that has optimal regret bounds with respect to k-shifting optima, i.e. bounds with respect to the optimum that is allowed to change arms multiple times. Moreover, for any window of size n the regret of our algorithm to any expert never exceeds O( n(log N + log T )), where N is the number of experts and T is the time horizon, while maintaining the essentially zero loss property. 1

5 0.4281019 204 nips-2011-Online Learning: Stochastic, Constrained, and Smoothed Adversaries

Author: Alexander Rakhlin, Karthik Sridharan, Ambuj Tewari

Abstract: Learning theory has largely focused on two main learning scenarios: the classical statistical setting where instances are drawn i.i.d. from a fixed distribution, and the adversarial scenario wherein, at every time step, an adversarially chosen instance is revealed to the player. It can be argued that in the real world neither of these assumptions is reasonable. We define the minimax value of a game where the adversary is restricted in his moves, capturing stochastic and non-stochastic assumptions on data. Building on the sequential symmetrization approach, we define a notion of distribution-dependent Rademacher complexity for the spectrum of problems ranging from i.i.d. to worst-case. The bounds let us immediately deduce variation-type bounds. We study a smoothed online learning scenario and show that exponentially small amount of noise can make function classes with infinite Littlestone dimension learnable. 1

6 0.39789802 3 nips-2011-A Collaborative Mechanism for Crowdsourcing Prediction Problems

7 0.39670986 300 nips-2011-Variance Reduction in Monte-Carlo Tree Search

8 0.38179436 202 nips-2011-On the Universality of Online Mirror Descent

9 0.36532182 25 nips-2011-Adaptive Hedge

10 0.35515547 80 nips-2011-Efficient Online Learning via Randomized Rounding

11 0.34863868 303 nips-2011-Video Annotation and Tracking with Active Learning

12 0.31667262 185 nips-2011-Newtron: an Efficient Bandit algorithm for Online Multiclass Prediction

13 0.30007017 79 nips-2011-Efficient Offline Communication Policies for Factored Multiagent POMDPs

14 0.2934708 210 nips-2011-PAC-Bayesian Analysis of Contextual Bandits

15 0.29253915 201 nips-2011-On the Completeness of First-Order Knowledge Compilation for Lifted Probabilistic Inference

16 0.28991023 208 nips-2011-Optimistic Optimization of a Deterministic Function without the Knowledge of its Smoothness

17 0.28811318 177 nips-2011-Multi-armed bandits on implicit metric spaces

18 0.28540352 98 nips-2011-From Bandits to Experts: On the Value of Side-Observations

19 0.28375345 65 nips-2011-Convergent Fitted Value Iteration with Linear Function Approximation

20 0.28241387 256 nips-2011-Solving Decision Problems with Limited Information


similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(0, 0.013), (4, 0.03), (20, 0.024), (26, 0.017), (31, 0.046), (43, 0.033), (45, 0.082), (57, 0.025), (74, 0.552), (79, 0.01), (83, 0.029), (86, 0.01), (99, 0.029)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.9517414 218 nips-2011-Predicting Dynamic Difficulty

Author: Olana Missura, Thomas Gärtner

Abstract: Motivated by applications in electronic games as well as teaching systems, we investigate the problem of dynamic difficulty adjustment. The task here is to repeatedly find a game difficulty setting that is neither ‘too easy’ and bores the player, nor ‘too difficult’ and overburdens the player. The contributions of this paper are (i) the formulation of difficulty adjustment as an online learning problem on partially ordered sets, (ii) an exponential update algorithm for dynamic difficulty adjustment, (iii) a bound on the number of wrong difficulty settings relative to the best static setting chosen in hindsight, and (iv) an empirical investigation of the algorithm when playing against adversaries. 1

2 0.93178207 104 nips-2011-Generalized Beta Mixtures of Gaussians

Author: Artin Armagan, Merlise Clyde, David B. Dunson

Abstract: In recent years, a rich variety of shrinkage priors have been proposed that have great promise in addressing massive regression problems. In general, these new priors can be expressed as scale mixtures of normals, but have more complex forms and better properties than traditional Cauchy and double exponential priors. We first propose a new class of normal scale mixtures through a novel generalized beta distribution that encompasses many interesting priors as special cases. This encompassing framework should prove useful in comparing competing priors, considering properties and revealing close connections. We then develop a class of variational Bayes approximations through the new hierarchy presented that will scale more efficiently to the types of truly massive data sets that are now encountered routinely. 1

3 0.92791396 259 nips-2011-Sparse Estimation with Structured Dictionaries

Author: David P. Wipf

Abstract: In the vast majority of recent work on sparse estimation algorithms, performance has been evaluated using ideal or quasi-ideal dictionaries (e.g., random Gaussian or Fourier) characterized by unit ℓ2 norm, incoherent columns or features. But in reality, these types of dictionaries represent only a subset of the dictionaries that are actually used in practice (largely restricted to idealized compressive sensing applications). In contrast, herein sparse estimation is considered in the context of structured dictionaries possibly exhibiting high coherence between arbitrary groups of columns and/or rows. Sparse penalized regression models are analyzed with the purpose of finding, to the extent possible, regimes of dictionary invariant performance. In particular, a Type II Bayesian estimator with a dictionarydependent sparsity penalty is shown to have a number of desirable invariance properties leading to provable advantages over more conventional penalties such as the ℓ1 norm, especially in areas where existing theoretical recovery guarantees no longer hold. This can translate into improved performance in applications such as model selection with correlated features, source localization, and compressive sensing with constrained measurement directions. 1

4 0.819444 155 nips-2011-Learning to Agglomerate Superpixel Hierarchies

Author: Viren Jain, Srinivas C. Turaga, K Briggman, Moritz N. Helmstaedter, Winfried Denk, H. S. Seung

Abstract: An agglomerative clustering algorithm merges the most similar pair of clusters at every iteration. The function that evaluates similarity is traditionally handdesigned, but there has been recent interest in supervised or semisupervised settings in which ground-truth clustered data is available for training. Here we show how to train a similarity function by regarding it as the action-value function of a reinforcement learning problem. We apply this general method to segment images by clustering superpixels, an application that we call Learning to Agglomerate Superpixel Hierarchies (LASH). When applied to a challenging dataset of brain images from serial electron microscopy, LASH dramatically improved segmentation accuracy when clustering supervoxels generated by state of the boundary detection algorithms. The naive strategy of directly training only supervoxel similarities and applying single linkage clustering produced less improvement. 1

5 0.72847843 68 nips-2011-Demixed Principal Component Analysis

Author: Wieland Brendel, Ranulfo Romo, Christian K. Machens

Abstract: In many experiments, the data points collected live in high-dimensional observation spaces, yet can be assigned a set of labels or parameters. In electrophysiological recordings, for instance, the responses of populations of neurons generally depend on mixtures of experimentally controlled parameters. The heterogeneity and diversity of these parameter dependencies can make visualization and interpretation of such data extremely difficult. Standard dimensionality reduction techniques such as principal component analysis (PCA) can provide a succinct and complete description of the data, but the description is constructed independent of the relevant task variables and is often hard to interpret. Here, we start with the assumption that a particularly informative description is one that reveals the dependency of the high-dimensional data on the individual parameters. We show how to modify the loss function of PCA so that the principal components seek to capture both the maximum amount of variance about the data, while also depending on a minimum number of parameters. We call this method demixed principal component analysis (dPCA) as the principal components here segregate the parameter dependencies. We phrase the problem as a probabilistic graphical model, and present a fast Expectation-Maximization (EM) algorithm. We demonstrate the use of this algorithm for electrophysiological data and show that it serves to demix the parameter-dependence of a neural population response. 1

6 0.54017514 196 nips-2011-On Strategy Stitching in Large Extensive Form Multiplayer Games

7 0.52240127 276 nips-2011-Structured sparse coding via lateral inhibition

8 0.50052714 57 nips-2011-Comparative Analysis of Viterbi Training and Maximum Likelihood Estimation for HMMs

9 0.48637706 191 nips-2011-Nonnegative dictionary learning in the exponential noise model for adaptive music signal representation

10 0.4839969 285 nips-2011-The Kernel Beta Process

11 0.47794124 158 nips-2011-Learning unbelievable probabilities

12 0.47619516 265 nips-2011-Sparse recovery by thresholded non-negative least squares

13 0.46730119 186 nips-2011-Noise Thresholds for Spectral Clustering

14 0.46630982 258 nips-2011-Sparse Bayesian Multi-Task Learning

15 0.46487278 183 nips-2011-Neural Reconstruction with Approximate Message Passing (NeuRAMP)

16 0.46297044 200 nips-2011-On the Analysis of Multi-Channel Neural Spike Data

17 0.4605512 125 nips-2011-Identifying Alzheimer's Disease-Related Brain Regions from Multi-Modality Neuroimaging Data using Sparse Composite Linear Discrimination Analysis

18 0.46024895 79 nips-2011-Efficient Offline Communication Policies for Factored Multiagent POMDPs

19 0.45391986 62 nips-2011-Continuous-Time Regression Models for Longitudinal Networks

20 0.45247284 43 nips-2011-Bayesian Partitioning of Large-Scale Distance Data