nips nips2007 nips2007-55 knowledge-graph by maker-knowledge-mining

55 nips-2007-Computing Robust Counter-Strategies


Source: pdf

Author: Michael Johanson, Martin Zinkevich, Michael Bowling

Abstract: Adaptation to other initially unknown agents often requires computing an effective counter-strategy. In the Bayesian paradigm, one must find a good counterstrategy to the inferred posterior of the other agents’ behavior. In the experts paradigm, one may want to choose experts that are good counter-strategies to the other agents’ expected behavior. In this paper we introduce a technique for computing robust counter-strategies for adaptation in multiagent scenarios under a variety of paradigms. The strategies can take advantage of a suspected tendency in the decisions of the other agents, while bounding the worst-case performance when the tendency is not observed. The technique involves solving a modified game, and therefore can make use of recently developed algorithms for solving very large extensive games. We demonstrate the effectiveness of the technique in two-player Texas Hold’em. We show that the computed poker strategies are substantially more robust than best response counter-strategies, while still exploiting a suspected tendency. We also compose the generated strategies in an experts algorithm showing a dramatic improvement in performance over using simple best responses. 1

Reference: text


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 The strategies can take advantage of a suspected tendency in the decisions of the other agents, while bounding the worst-case performance when the tendency is not observed. [sent-11, score-0.276]

2 We show that the computed poker strategies are substantially more robust than best response counter-strategies, while still exploiting a suspected tendency. [sent-14, score-0.796]

3 We also compose the generated strategies in an experts algorithm showing a dramatic improvement in performance over using simple best responses. [sent-15, score-0.347]

4 The problem with this approach is that best response strategies can be very brittle. [sent-32, score-0.386]

5 The use of best response counter-strategies, therefore, puts an impossible burden on a priori choices, either the agent model bias or the set of expert counter-strategies. [sent-34, score-0.269]

6 Their technique chooses the best performance maximizing strategy from the set of strategies that don’t lose more than in the worstcase. [sent-36, score-0.5]

7 The strategy balances exploiting the agent model with a safety guarantee in case the model is wrong. [sent-37, score-0.246]

8 Although conceptually appealing, it is computationally infeasible even for moderately sized domains and has only been employed in the simple game of Ro-Sham-Bo. [sent-38, score-0.295]

9 The technique involves computing a Nash equilibrium of a modified game, and therefore can exploit recent advances in solving large extensive games [GHPS07, ZBB07, ZJBP08]. [sent-41, score-0.442]

10 We begin by reviewing the concepts of extensive form games, best responses, and Nash equilibria, as well as describing how these concepts apply in the poker domain. [sent-43, score-0.423]

11 We then describe a technique for computing an approximate best response to an arbitrary poker strategy, and show that this, indeed, produces brittle counter-strategies. [sent-44, score-0.573]

12 Finally, we demonstrate that these strategies can be used in an experts algorithm to make a more effective adaptive player than when using simple best response. [sent-46, score-0.584]

13 2 Background A perfect information extensive game consists of a tree of game states. [sent-47, score-0.713]

14 At each game state, an action is made either by nature, or by one of the players, or the state is a terminal state where each player receives a fixed utility. [sent-48, score-0.593]

15 A strategy for a player consists of a distribution over actions for every game state. [sent-49, score-0.764]

16 In an imperfect information extensive game, the states where a player makes an action are divided into information sets. [sent-50, score-0.437]

17 When a player chooses an action, it does not know the state of the game, only the information set, and therefore its strategy is a mapping from information sets to distributions over actions. [sent-51, score-0.456]

18 A common restriction on imperfect information extensive games is perfect recall, where two states can only be in the same information set for a player if that player took the same actions from the same information sets to reach the two game states. [sent-52, score-1.08]

19 In the remainder of the paper, we will be considering imperfect information extensive games with perfect recall. [sent-53, score-0.251]

20 Let σi be a strategy for player i where σi (I, a) is the probability that strategy assigns to action a in information set I. [sent-54, score-0.642]

21 Let Σi be the set of strategies for player i, and define ui (σ1 , σ2 ) to be the expected utility of player i if player 1 uses σ1 ∈ Σ1 and player 2 uses σ2 ∈ Σ2 . [sent-55, score-1.241]

22 A zero-sum extensive game is an extensive game where u1 = −u2 . [sent-60, score-0.788]

23 Define the value of the game to player 1 (v1 ) to be the expected utility of player 1 in equilibrium. [sent-62, score-0.841]

24 In a zero-sum extensive game, the exploitability of a strategy σ1 ∈ Σ1 is: ex(σ1 ) = max (v1 − u1 (σ1 , σ2 )). [sent-63, score-0.394]

25 (2) σ2 ∈Σ2 The value of the game to player 2 (v2 ) and the exploitability of a strategy σ2 ∈ Σ2 are defined similarly. [sent-64, score-0.85]

26 An -Nash equilibrium in a zero-sum extensive game is a strategy pair where both strategies are -safe. [sent-66, score-0.92]

27 Formally, define π σi (I) to be the probability that player i when following strategy σi chooses the actions necessary to 2 make information set I reachable from the root of the game tree. [sent-69, score-0.788]

28 Given σ1 , σ1 ∈ Σ1 and p ∈ [0, 1], define mixp (σ1 , σ1 ) ∈ Σ1 such that for any information set I of player 1, for all actions a: mixp (σ1 , σ1 )(I, a) = p × π σ1 (I)σ1 (I, a) + (1 − p) × π σ1 (I)σ1 (I, a) . [sent-70, score-0.453]

29 p × π σ1 (I) + (1 − p) × π σ1 (I) (3) Given an event E, define Prσ1 ,σ2 [E] to be the probability of the event E given player 1 uses σ1 , and player 2 uses σ2 . [sent-71, score-0.52]

30 In particular, we look at heads-up limit Texas Hold’em, the game used in the AAAI Computer Poker Competition [ZL06]. [sent-75, score-0.295]

31 A single hand of this poker variant consists of two players each being dealt two private cards, followed by five community cards being revealed. [sent-76, score-0.442]

32 Each player tries to form the best five-card poker hand from the community cards and her private cards: if the hand goes to a showdown, the player with the best five-card hand wins the pot. [sent-77, score-1.01]

33 The key to good play is on average to have more chips in the pot when you win than are in the pot when you lose. [sent-78, score-0.285]

34 After the private cards are dealt, a round of betting occurs, followed by additional betting rounds after the third (flop), fourth (turn), and fifth (river) community cards are revealed. [sent-80, score-0.387]

35 Betting rounds involve players alternately deciding to either fold (letting the other player win the chips in the pot), call (matching the opponent’s chips in the pot), or raise (matching, and then adding an additional fixed amount into the pot). [sent-81, score-0.443]

36 Notice that heads-up limit Texas Hold’em is an example of a finite imperfect information extensive game with perfect recall. [sent-83, score-0.458]

37 To provide some intuition for these numbers, a player that always folds will lose 750 mb/h while a typical player that is 10 mb/h stronger than another would require over one million hands to be 95% certain to have won overall. [sent-86, score-0.633]

38 While being a relatively small variant of poker, the game tree for heads-up limit Texas Hold’em is still very large, having approximately 9. [sent-88, score-0.295]

39 Fundamental operations, such as computing a best response strategy or a Nash equilibrium as described in Section 2, are intractable on the full game. [sent-90, score-0.562]

40 If the abstraction involves the same betting structure, a strategy for an abstract game can be played directly in the full game. [sent-94, score-0.735]

41 If the abstraction is small enough Nash equilibria and best response computations become feasible. [sent-95, score-0.401]

42 Finding an approximate Nash equilibrium in an abstract game has proven to be an effective way to construct a strong program for the full game [BBD+ 03, GS06]. [sent-96, score-0.769]

43 Recent solution techniques have been able to compute approximate Nash equilibria for abstractions with as many as 1010 game states [ZBB07, GHPS07]. [sent-97, score-0.353]

44 Given a strategy defined in a small enough abstraction, it is also possible to compute a best response to the strategy in the abstract game. [sent-98, score-0.555]

45 45 × 109 game states, and is described in an accompanying technical report [JZB07]. [sent-101, score-0.326]

46 Since this work focuses on adapting to other agents’ behavior, our experiments make use of a battery of different poker playing programs. [sent-103, score-0.346]

47 PsOpti4 [BBD+ 03] is one of the earliest successful near equilibrium programs for poker and is available as “Sparbot” in the commercial title Poker Academy. [sent-105, score-0.622]

48 S1239, S1399, and S2298 are similar near equilibrium strategies generated by a new 3 equilibrium computation method [ZBB07] using a much larger abstraction than is used in PsOpti4 and PsOpti6. [sent-108, score-0.716]

49 CFR5 is a new near Nash equilibrium [ZJBP08], and uses the abstraction described in the accompanying technical report [JZB07]. [sent-110, score-0.393]

50 4 Frequentist Best Response In the introduction, we described best response counter-strategies as brittle, performing poorly when playing against a different strategy from the one which they were computed to exploit. [sent-112, score-0.456]

51 Since a best response computation is intractable in the full game, we first describe a technique, called frequentist best response, for finding a “good response” using an abstract game. [sent-114, score-0.337]

52 As described in the previous section, given a strategy in an abstract game we can compute a best response to that strategy within the abstraction. [sent-115, score-0.85]

53 The challenge is that the abstraction used by an arbitrary opponent is not known. [sent-116, score-0.362]

54 The basic idea of frequentist best response (FBR) is to observe P playing the full game of poker, construct a model of it in an abstract game (unrelated to that P’s own abstraction), and then compute a best-response in this abstraction. [sent-119, score-0.924]

55 It finds the action’s associated information set in the abstract game and increments a counter associated with that information set and action. [sent-122, score-0.295]

56 After observing a sufficient number of hands, we can construct a strategy in the abstract game based on the frequency counts. [sent-123, score-0.467]

57 Since this strategy is defined in a known abstraction, FBR can simply calculate a best response to this frequentist strategy. [sent-126, score-0.458]

58 P’s opponent in the observed games greatly affects the quality of the model. [sent-127, score-0.295]

59 Observing P in self-play or against near equilibrium strategies has shown to require considerably more observed hands. [sent-130, score-0.405]

60 We computed frequentist best response strategies against seven different opponents. [sent-133, score-0.496]

61 We played the resulting responses both against the opponent it was designed to exploit as well as the other six opponents and an approximate equilibrium strategy computed using the same abstraction. [sent-134, score-0.993]

62 Positive numbers (appearing with a green background) are in favor of the row player (FBR strategies, in this case). [sent-136, score-0.26]

63 The first thing to notice is that FBR is very successful at exploiting the opponent it was designed to exploit, i. [sent-137, score-0.34]

64 In some cases, FBR identified strategies exploiting the opponent for more than previously known to be possible, e. [sent-140, score-0.451]

65 The second thing to notice is that when FBR strategies play against other opponents their performance is poor, i. [sent-143, score-0.5]

66 It is exploitable for over 2000 mb/h (note that always fold only loses 750 mb/h) and an approximate equilibrium strategy defeats it by 93 mb/h. [sent-147, score-0.431]

67 These results give evidence that best response is, in practice, a brittle computation, and can perform poorly when the model is wrong. [sent-149, score-0.277]

68 Considering the similarity of these opponents, even this apparent exception is actually suggestive that best response is not robust to even slight changes in the model. [sent-158, score-0.255]

69 We would like a strategy that can exploit an opponent successfully like FBR, but without the large penalty when playing against a different opponent. [sent-164, score-0.501]

70 This, in itself, can be thought of as a game for which we can apply the usual game theoretic machinery of equilibria. [sent-171, score-0.59]

71 Let our model of our opponent be some strategy σfix ∈ Σ2 . [sent-172, score-0.402]

72 Define Σp,σfix to be those strategies of 2 the form mixp (σfix , σ2 ), where σ2 is an arbitrary strategy in Σ2 . [sent-173, score-0.425]

73 Define the set of restricted best responses to σ1 ∈ Σ1 to be: BRp,σfix (σ1 ) = argmax u2 (σ1 , σ2 ) (5) p,σfix σ2 ∈Σ2 ∗ ∗ ∗ ∗ A (p, σfix ) restricted Nash equilibrium is a pair of strategies (σ1 , σ2 ) where σ2 ∈ BRp,σfix (σ1 ) ∗ ∗ ∗ and σ1 ∈ BR(σ2 ). [sent-174, score-0.603]

74 In this pair, the strategy σ1 is a p-restricted Nash response (RNR) to σfix . [sent-175, score-0.332]

75 Define Σ1-safe ⊆ Σ1 to be the set of all strategies which are -safe (with an exploitability less than ). [sent-178, score-0.298]

76 Then the set of -safe best responses are: BR -safe (σ2 ) = argmax u1 (σ1 , σ2 ) σ1 ∈Σ (6) -safe Theorem 1 For all σ2 ∈ Σ2 , for all p ∈ (0, 1], if σ1 is a p-RNR to σ2 , then there exists an such that σ1 is an -safe best response to σ2 . [sent-179, score-0.36]

77 The significance of Theorem 1 is that, among all strategies that are at most suboptimal, the RNR strategies are among the best responses. [sent-204, score-0.401]

78 Thus, if we want a strategy that is at most suboptimal, we can vary p to produce a strategy that is the best response among all such -safe strategies. [sent-205, score-0.555]

79 In our experiments we use a recently developed solution technique based on regret minimization [ZJBP08] with a modified game that starts with an unobserved chance node deciding whether the restricted player is forced to use strategy σfix on the current hand. [sent-208, score-0.863]

80 By varying the value p ∈ [0, 1], we can produce poker strategies that are closer to a Nash equilibrium (when p is near 0) or are closer to the best response (when p is near 1). [sent-213, score-0.94]

81 When producing an RNR to a particular opponent, it is useful to consider the tradeoff between the utility of the response against that opponent and the exploitability of the response itself. [sent-214, score-0.732]

82 The y-axis shows the exploitation of the model by the response in the abstract game. [sent-218, score-0.24]

83 Note that the actual exploitation and exploitability in the full game may be different, as we explore later. [sent-219, score-0.498]

84 We used RNR to compute a counter-strategy to the same seven opponents used in the FBR experiments, with the p value used for each opponent selected such that the resulting is close to 100 mb/h. [sent-224, score-0.497]

85 The RNR strategies were played against these seven opponents and the equilibrium CFR5 in the full game of Texas Hold’em. [sent-225, score-0.97]

86 The first thing to notice is that RNR is capable of exploiting the opponent for which it was designed as a counter-strategy, while still performing well against the other opponents. [sent-227, score-0.34]

87 Notice, though, that it does exploit these opponents significantly more than the approximate Nash strategy (CFR5). [sent-230, score-0.455]

88 In this section we investigate their use in an adaptive poker program. [sent-239, score-0.273]

89 We generated four counter-strategies based on the opponents PsOpti4, A80, S1399, and S2298, and then used these as experts which UCB1 [ACBF02] (a regret minimizing algorithm) selected amongst. [sent-240, score-0.368]

90 We then played these two expert mixtures in 1000 hand matches against both the four programs used to generate the counter strategies as well as two programs from the 2006 AAAI Computer Poker Competition, which have an unknown origin and were developed independently of the other programs. [sent-242, score-0.535]

91 We call the first four programs “training opponents” and the other two programs “holdout opponents”, as they are similar to training error and holdout error in supervised learning. [sent-243, score-0.293]

92 As expected, when the opponent matches one of the training models, FBR-experts and RNR-experts perform better, on average, than a near equilibrium strategy (see “Training Average” in Figure 2). [sent-245, score-0.67]

93 Since it is unlikely a player will have a model of the exact program its likely to face in a competition, it is important for its counter-strategies to exploit general weaknesses that might be encountered. [sent-250, score-0.361]

94 Our holdout programs have no explicit relationship to the training programs, yet the RNR counterstrategies are still effective at exploiting these programs as demonstrated by the expert mixture being able to exploit these programs by more than the near equilibrium strategy. [sent-251, score-0.816]

95 The restricted Nash responses balance exploiting suspected tendencies in other agents’ behavior, while bounding the worst-case performance when the tendency is not observed. [sent-254, score-0.298]

96 The technique involves computing an approximate equilibrium to a modification of the original game, and therefore can make use of recently developed algorithms for solving very large extensive games. [sent-255, score-0.326]

97 We also showed that a simple mixture of experts algorithm based on restricted Nash response counter-strategies was far superior to using best response counter-strategies if the exact opponent was not used in training. [sent-257, score-0.749]

98 Further, the restricted Nash experts algorithm outperformed a static non-adaptive near equilibrium at exploiting the previously unseen programs. [sent-258, score-0.424]

99 Gradient-based algorithms for finding nash equilibria in extensive form games. [sent-284, score-0.409]

100 A competitive texas hold’em poker player via automated abstraction and real-time equilibrium computation. [sent-289, score-0.943]


similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('fbr', 0.329), ('game', 0.295), ('poker', 0.273), ('player', 0.26), ('rnr', 0.259), ('nash', 0.252), ('opponents', 0.232), ('opponent', 0.23), ('equilibrium', 0.179), ('strategies', 0.175), ('strategy', 0.172), ('response', 0.16), ('agents', 0.132), ('abstraction', 0.132), ('exploitability', 0.123), ('programs', 0.119), ('texas', 0.099), ('extensive', 0.099), ('experts', 0.098), ('pot', 0.094), ('hands', 0.083), ('betting', 0.082), ('cards', 0.082), ('exploitation', 0.08), ('crosstable', 0.078), ('mixp', 0.078), ('frequentist', 0.075), ('responses', 0.075), ('br', 0.07), ('games', 0.065), ('bowling', 0.062), ('equilibria', 0.058), ('aaai', 0.057), ('zinkevich', 0.055), ('holdout', 0.055), ('exploitable', 0.055), ('johanson', 0.055), ('played', 0.054), ('players', 0.054), ('exploit', 0.051), ('near', 0.051), ('best', 0.051), ('restricted', 0.05), ('weaknesses', 0.05), ('playing', 0.048), ('technique', 0.048), ('bbd', 0.047), ('counterstrategies', 0.047), ('suspected', 0.047), ('hold', 0.046), ('exploiting', 0.046), ('robust', 0.044), ('brittle', 0.041), ('imperfect', 0.04), ('regret', 0.038), ('matches', 0.038), ('action', 0.038), ('duplicate', 0.037), ('actions', 0.037), ('em', 0.036), ('multiagent', 0.035), ('chips', 0.035), ('seven', 0.035), ('win', 0.033), ('private', 0.033), ('notice', 0.033), ('tradeoff', 0.033), ('competition', 0.032), ('mccracken', 0.031), ('rnrs', 0.031), ('tendencies', 0.031), ('tournament', 0.031), ('thing', 0.031), ('accompanying', 0.031), ('expert', 0.03), ('lose', 0.03), ('play', 0.029), ('agent', 0.028), ('michael', 0.028), ('alberta', 0.028), ('safe', 0.027), ('bluffbot', 0.027), ('brp', 0.027), ('gilpin', 0.027), ('monash', 0.027), ('winnings', 0.027), ('paradigm', 0.027), ('tendency', 0.027), ('rounds', 0.026), ('utility', 0.026), ('adapting', 0.025), ('poorly', 0.025), ('loses', 0.025), ('perfect', 0.024), ('chooses', 0.024), ('argmax', 0.023), ('dramatic', 0.023), ('pr', 0.023), ('remainder', 0.023), ('balance', 0.022)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 1.0000001 55 nips-2007-Computing Robust Counter-Strategies

Author: Michael Johanson, Martin Zinkevich, Michael Bowling

Abstract: Adaptation to other initially unknown agents often requires computing an effective counter-strategy. In the Bayesian paradigm, one must find a good counterstrategy to the inferred posterior of the other agents’ behavior. In the experts paradigm, one may want to choose experts that are good counter-strategies to the other agents’ expected behavior. In this paper we introduce a technique for computing robust counter-strategies for adaptation in multiagent scenarios under a variety of paradigms. The strategies can take advantage of a suspected tendency in the decisions of the other agents, while bounding the worst-case performance when the tendency is not observed. The technique involves solving a modified game, and therefore can make use of recently developed algorithms for solving very large extensive games. We demonstrate the effectiveness of the technique in two-player Texas Hold’em. We show that the computed poker strategies are substantially more robust than best response counter-strategies, while still exploiting a suspected tendency. We also compose the generated strategies in an experts algorithm showing a dramatic improvement in performance over using simple best responses. 1

2 0.52367371 165 nips-2007-Regret Minimization in Games with Incomplete Information

Author: Martin Zinkevich, Michael Johanson, Michael Bowling, Carmelo Piccione

Abstract: Extensive games are a powerful model of multiagent decision-making scenarios with incomplete information. Finding a Nash equilibrium for very large instances of these games has received a great deal of recent attention. In this paper, we describe a new technique for solving large games based on regret minimization. In particular, we introduce the notion of counterfactual regret, which exploits the degree of incomplete information in an extensive game. We show how minimizing counterfactual regret minimizes overall regret, and therefore in self-play can be used to compute a Nash equilibrium. We demonstrate this technique in the domain of poker, showing we can solve abstractions of limit Texas Hold’em with as many as 1012 states, two orders of magnitude larger than previous methods. 1

3 0.18744199 54 nips-2007-Computational Equivalence of Fixed Points and No Regret Algorithms, and Convergence to Equilibria

Author: Elad Hazan, Satyen Kale

Abstract: We study the relation between notions of game-theoretic equilibria which are based on stability under a set of deviations, and empirical equilibria which are reached by rational players. Rational players are modeled by players using no regret algorithms, which guarantee that their payoff in the long run is close to the maximum they could hope to achieve by consistently deviating from the algorithm’s suggested action. We show that for a given set of deviations over the strategy set of a player, it is possible to efficiently approximate fixed points of a given deviation if and only if there exist efficient no regret algorithms resistant to the deviations. Further, we show that if all players use a no regret algorithm, then the empirical distribution of their plays converges to an equilibrium. 1

4 0.13341561 52 nips-2007-Competition Adds Complexity

Author: Judy Goldsmith, Martin Mundhenk

Abstract: It is known that determinining whether a DEC-POMDP, namely, a cooperative partially observable stochastic game (POSG), has a cooperative strategy with positive expected reward is complete for NEXP. It was not known until now how cooperation affected that complexity. We show that, for competitive POSGs, the complexity of determining whether one team has a positive-expected-reward strategy is complete for NEXPNP .

5 0.12827215 208 nips-2007-TrueSkill Through Time: Revisiting the History of Chess

Author: Pierre Dangauthier, Ralf Herbrich, Tom Minka, Thore Graepel

Abstract: We extend the Bayesian skill rating system TrueSkill to infer entire time series of skills of players by smoothing through time instead of filtering. The skill of each participating player, say, every year is represented by a latent skill variable which is affected by the relevant game outcomes that year, and coupled with the skill variables of the previous and subsequent year. Inference in the resulting factor graph is carried out by approximate message passing (EP) along the time series of skills. As before the system tracks the uncertainty about player skills, explicitly models draws, can deal with any number of competing entities and can infer individual skills from team results. We extend the system to estimate player-specific draw margins. Based on these models we present an analysis of the skill curves of important players in the history of chess over the past 150 years. Results include plots of players’ lifetime skill development as well as the ability to compare the skills of different players across time. Our results indicate that a) the overall playing strength has increased over the past 150 years, and b) that modelling a player’s ability to force a draw provides significantly better predictive power. 1

6 0.11946229 5 nips-2007-A Game-Theoretic Approach to Apprenticeship Learning

7 0.099700525 57 nips-2007-Congruence between model and human attention reveals unique signatures of critical visual events

8 0.064439341 194 nips-2007-The Epoch-Greedy Algorithm for Multi-armed Bandits with Side Information

9 0.049636915 189 nips-2007-Supervised Topic Models

10 0.04828899 168 nips-2007-Reinforcement Learning in Continuous Action Spaces through Sequential Monte Carlo Methods

11 0.042527214 100 nips-2007-Hippocampal Contributions to Control: The Third Way

12 0.041192494 140 nips-2007-Neural characterization in partially observed populations of spiking neurons

13 0.039355531 157 nips-2007-Privacy-Preserving Belief Propagation and Sampling

14 0.039027732 199 nips-2007-The Price of Bandit Information for Online Optimization

15 0.036748867 148 nips-2007-Online Linear Regression and Its Application to Model-Based Reinforcement Learning

16 0.036436271 151 nips-2007-Optimistic Linear Programming gives Logarithmic Regret for Irreducible MDPs

17 0.036273662 21 nips-2007-Adaptive Online Gradient Descent

18 0.035810772 124 nips-2007-Managing Power Consumption and Performance of Computing Systems Using Reinforcement Learning

19 0.034833279 79 nips-2007-Efficient multiple hyperparameter learning for log-linear models

20 0.032819908 182 nips-2007-Sparse deep belief net model for visual area V2


similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, -0.122), (1, -0.172), (2, 0.044), (3, -0.004), (4, 0.362), (5, 0.001), (6, -0.126), (7, 0.122), (8, 0.052), (9, 0.166), (10, -0.209), (11, -0.234), (12, 0.092), (13, -0.102), (14, 0.137), (15, 0.029), (16, -0.046), (17, 0.124), (18, 0.015), (19, -0.019), (20, 0.187), (21, 0.048), (22, 0.001), (23, 0.18), (24, -0.006), (25, 0.062), (26, 0.087), (27, 0.18), (28, 0.027), (29, -0.173), (30, -0.084), (31, 0.042), (32, -0.086), (33, -0.029), (34, 0.005), (35, 0.011), (36, -0.029), (37, -0.047), (38, 0.047), (39, -0.049), (40, -0.023), (41, -0.079), (42, -0.091), (43, 0.058), (44, 0.075), (45, 0.047), (46, -0.084), (47, 0.045), (48, -0.126), (49, 0.025)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.98074937 55 nips-2007-Computing Robust Counter-Strategies

Author: Michael Johanson, Martin Zinkevich, Michael Bowling

Abstract: Adaptation to other initially unknown agents often requires computing an effective counter-strategy. In the Bayesian paradigm, one must find a good counterstrategy to the inferred posterior of the other agents’ behavior. In the experts paradigm, one may want to choose experts that are good counter-strategies to the other agents’ expected behavior. In this paper we introduce a technique for computing robust counter-strategies for adaptation in multiagent scenarios under a variety of paradigms. The strategies can take advantage of a suspected tendency in the decisions of the other agents, while bounding the worst-case performance when the tendency is not observed. The technique involves solving a modified game, and therefore can make use of recently developed algorithms for solving very large extensive games. We demonstrate the effectiveness of the technique in two-player Texas Hold’em. We show that the computed poker strategies are substantially more robust than best response counter-strategies, while still exploiting a suspected tendency. We also compose the generated strategies in an experts algorithm showing a dramatic improvement in performance over using simple best responses. 1

2 0.90457731 165 nips-2007-Regret Minimization in Games with Incomplete Information

Author: Martin Zinkevich, Michael Johanson, Michael Bowling, Carmelo Piccione

Abstract: Extensive games are a powerful model of multiagent decision-making scenarios with incomplete information. Finding a Nash equilibrium for very large instances of these games has received a great deal of recent attention. In this paper, we describe a new technique for solving large games based on regret minimization. In particular, we introduce the notion of counterfactual regret, which exploits the degree of incomplete information in an extensive game. We show how minimizing counterfactual regret minimizes overall regret, and therefore in self-play can be used to compute a Nash equilibrium. We demonstrate this technique in the domain of poker, showing we can solve abstractions of limit Texas Hold’em with as many as 1012 states, two orders of magnitude larger than previous methods. 1

3 0.60155821 57 nips-2007-Congruence between model and human attention reveals unique signatures of critical visual events

Author: Robert Peters, Laurent Itti

Abstract: Current computational models of bottom-up and top-down components of attention are predictive of eye movements across a range of stimuli and of simple, fixed visual tasks (such as visual search for a target among distractors). However, to date there exists no computational framework which can reliably mimic human gaze behavior in more complex environments and tasks, such as driving a vehicle through traffic. Here, we develop a hybrid computational/behavioral framework, combining simple models for bottom-up salience and top-down relevance, and looking for changes in the predictive power of these components at different critical event times during 4.7 hours (500,000 video frames) of observers playing car racing and flight combat video games. This approach is motivated by our observation that the predictive strengths of the salience and relevance models exhibit reliable temporal signatures during critical event windows in the task sequence—for example, when the game player directly engages an enemy plane in a flight combat game, the predictive strength of the salience model increases significantly, while that of the relevance model decreases significantly. Our new framework combines these temporal signatures to implement several event detectors. Critically, we find that an event detector based on fused behavioral and stimulus information (in the form of the model’s predictive strength) is much stronger than detectors based on behavioral information alone (eye position) or image information alone (model prediction maps). This approach to event detection, based on eye tracking combined with computational models applied to the visual input, may have useful applications as a less-invasive alternative to other event detection approaches based on neural signatures derived from EEG or fMRI recordings. 1

4 0.5613994 52 nips-2007-Competition Adds Complexity

Author: Judy Goldsmith, Martin Mundhenk

Abstract: It is known that determinining whether a DEC-POMDP, namely, a cooperative partially observable stochastic game (POSG), has a cooperative strategy with positive expected reward is complete for NEXP. It was not known until now how cooperation affected that complexity. We show that, for competitive POSGs, the complexity of determining whether one team has a positive-expected-reward strategy is complete for NEXPNP .

5 0.55229533 54 nips-2007-Computational Equivalence of Fixed Points and No Regret Algorithms, and Convergence to Equilibria

Author: Elad Hazan, Satyen Kale

Abstract: We study the relation between notions of game-theoretic equilibria which are based on stability under a set of deviations, and empirical equilibria which are reached by rational players. Rational players are modeled by players using no regret algorithms, which guarantee that their payoff in the long run is close to the maximum they could hope to achieve by consistently deviating from the algorithm’s suggested action. We show that for a given set of deviations over the strategy set of a player, it is possible to efficiently approximate fixed points of a given deviation if and only if there exist efficient no regret algorithms resistant to the deviations. Further, we show that if all players use a no regret algorithm, then the empirical distribution of their plays converges to an equilibrium. 1

6 0.49319381 208 nips-2007-TrueSkill Through Time: Revisiting the History of Chess

7 0.27354866 171 nips-2007-Scan Strategies for Meteorological Radars

8 0.26343575 5 nips-2007-A Game-Theoretic Approach to Apprenticeship Learning

9 0.21969488 194 nips-2007-The Epoch-Greedy Algorithm for Multi-armed Bandits with Side Information

10 0.15822403 174 nips-2007-Selecting Observations against Adversarial Objectives

11 0.15414774 178 nips-2007-Simulated Annealing: Rigorous finite-time guarantees for optimization on continuous domains

12 0.15071264 44 nips-2007-Catching Up Faster in Bayesian Model Selection and Model Averaging

13 0.1453364 28 nips-2007-Augmented Functional Time Series Representation and Forecasting with Gaussian Processes

14 0.14232683 31 nips-2007-Bayesian Agglomerative Clustering with Coalescents

15 0.14228475 27 nips-2007-Anytime Induction of Cost-sensitive Trees

16 0.14000981 89 nips-2007-Feature Selection Methods for Improving Protein Structure Prediction with Rosetta

17 0.1392768 207 nips-2007-Transfer Learning using Kolmogorov Complexity: Basic Theory and Empirical Evaluations

18 0.13732708 100 nips-2007-Hippocampal Contributions to Control: The Third Way

19 0.1369803 98 nips-2007-Hierarchical Apprenticeship Learning with Application to Quadruped Locomotion

20 0.13212387 78 nips-2007-Efficient Principled Learning of Thin Junction Trees


similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(5, 0.035), (13, 0.022), (16, 0.028), (19, 0.011), (21, 0.041), (29, 0.442), (31, 0.023), (34, 0.014), (35, 0.02), (46, 0.011), (47, 0.071), (68, 0.041), (83, 0.087), (85, 0.017), (90, 0.044)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.75530165 55 nips-2007-Computing Robust Counter-Strategies

Author: Michael Johanson, Martin Zinkevich, Michael Bowling

Abstract: Adaptation to other initially unknown agents often requires computing an effective counter-strategy. In the Bayesian paradigm, one must find a good counterstrategy to the inferred posterior of the other agents’ behavior. In the experts paradigm, one may want to choose experts that are good counter-strategies to the other agents’ expected behavior. In this paper we introduce a technique for computing robust counter-strategies for adaptation in multiagent scenarios under a variety of paradigms. The strategies can take advantage of a suspected tendency in the decisions of the other agents, while bounding the worst-case performance when the tendency is not observed. The technique involves solving a modified game, and therefore can make use of recently developed algorithms for solving very large extensive games. We demonstrate the effectiveness of the technique in two-player Texas Hold’em. We show that the computed poker strategies are substantially more robust than best response counter-strategies, while still exploiting a suspected tendency. We also compose the generated strategies in an experts algorithm showing a dramatic improvement in performance over using simple best responses. 1

2 0.66664058 169 nips-2007-Retrieved context and the discovery of semantic structure

Author: Vinayak Rao, Marc Howard

Abstract: Semantic memory refers to our knowledge of facts and relationships between concepts. A successful semantic memory depends on inferring relationships between items that are not explicitly taught. Recent mathematical modeling of episodic memory argues that episodic recall relies on retrieval of a gradually-changing representation of temporal context. We show that retrieved context enables the development of a global memory space that reflects relationships between all items that have been previously learned. When newly-learned information is integrated into this structure, it is placed in some relationship to all other items, even if that relationship has not been explicitly learned. We demonstrate this effect for global semantic structures shaped topologically as a ring, and as a two-dimensional sheet. We also examined the utility of this learning algorithm for learning a more realistic semantic space by training it on a large pool of synonym pairs. Retrieved context enabled the model to “infer” relationships between synonym pairs that had not yet been presented. 1

3 0.6212666 165 nips-2007-Regret Minimization in Games with Incomplete Information

Author: Martin Zinkevich, Michael Johanson, Michael Bowling, Carmelo Piccione

Abstract: Extensive games are a powerful model of multiagent decision-making scenarios with incomplete information. Finding a Nash equilibrium for very large instances of these games has received a great deal of recent attention. In this paper, we describe a new technique for solving large games based on regret minimization. In particular, we introduce the notion of counterfactual regret, which exploits the degree of incomplete information in an extensive game. We show how minimizing counterfactual regret minimizes overall regret, and therefore in self-play can be used to compute a Nash equilibrium. We demonstrate this technique in the domain of poker, showing we can solve abstractions of limit Texas Hold’em with as many as 1012 states, two orders of magnitude larger than previous methods. 1

4 0.53966779 64 nips-2007-Cooled and Relaxed Survey Propagation for MRFs

Author: Hai L. Chieu, Wee S. Lee, Yee W. Teh

Abstract: We describe a new algorithm, Relaxed Survey Propagation (RSP), for finding MAP configurations in Markov random fields. We compare its performance with state-of-the-art algorithms including the max-product belief propagation, its sequential tree-reweighted variant, residual (sum-product) belief propagation, and tree-structured expectation propagation. We show that it outperforms all approaches for Ising models with mixed couplings, as well as on a web person disambiguation task formulated as a supervised clustering problem. 1

5 0.448347 45 nips-2007-Classification via Minimum Incremental Coding Length (MICL)

Author: John Wright, Yangyu Tao, Zhouchen Lin, Yi Ma, Heung-yeung Shum

Abstract: We present a simple new criterion for classification, based on principles from lossy data compression. The criterion assigns a test sample to the class that uses the minimum number of additional bits to code the test sample, subject to an allowable distortion. We prove asymptotic optimality of this criterion for Gaussian data and analyze its relationships to classical classifiers. Theoretical results provide new insights into relationships among popular classifiers such as MAP and RDA, as well as unsupervised clustering methods based on lossy compression [13]. Minimizing the lossy coding length induces a regularization effect which stabilizes the (implicit) density estimate in a small-sample setting. Compression also provides a uniform means of handling classes of varying dimension. This simple classification criterion and its kernel and local versions perform competitively against existing classifiers on both synthetic examples and real imagery data such as handwritten digits and human faces, without requiring domain-specific information. 1

6 0.3373853 208 nips-2007-TrueSkill Through Time: Revisiting the History of Chess

7 0.32214242 5 nips-2007-A Game-Theoretic Approach to Apprenticeship Learning

8 0.31747231 57 nips-2007-Congruence between model and human attention reveals unique signatures of critical visual events

9 0.31704444 54 nips-2007-Computational Equivalence of Fixed Points and No Regret Algorithms, and Convergence to Equilibria

10 0.30850115 37 nips-2007-Blind channel identification for speech dereverberation using l1-norm sparse learning

11 0.30735216 138 nips-2007-Near-Maximum Entropy Models for Binary Neural Representations of Natural Images

12 0.2929408 63 nips-2007-Convex Relaxations of Latent Variable Training

13 0.29162252 158 nips-2007-Probabilistic Matrix Factorization

14 0.29024243 174 nips-2007-Selecting Observations against Adversarial Objectives

15 0.28946495 86 nips-2007-Exponential Family Predictive Representations of State

16 0.28927699 177 nips-2007-Simplified Rules and Theoretical Analysis for Information Bottleneck Optimization and PCA with Spiking Neurons

17 0.2889213 94 nips-2007-Gaussian Process Models for Link Analysis and Transfer Learning

18 0.28850567 18 nips-2007-A probabilistic model for generating realistic lip movements from speech

19 0.28849167 115 nips-2007-Learning the 2-D Topology of Images

20 0.28837678 93 nips-2007-GRIFT: A graphical model for inferring visual classification features from human data