nips nips2009 nips2009-232 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Kevin Waugh, Nolan Bard, Michael Bowling
Abstract: Extensive games are often used to model the interactions of multiple agents within an environment. Much recent work has focused on increasing the size of an extensive game that can be feasibly solved. Despite these improvements, many interesting games are still too large for such techniques. A common approach for computing strategies in these large games is to first employ an abstraction technique to reduce the original game to an abstract game that is of a manageable size. This abstract game is then solved and the resulting strategy is played in the original game. Most top programs in recent AAAI Computer Poker Competitions use this approach. The trend in this competition has been that strategies found in larger abstract games tend to beat strategies found in smaller abstract games. These larger abstract games have more expressive strategy spaces and therefore contain better strategies. In this paper we present a new method for computing strategies in large games. This method allows us to compute more expressive strategies without increasing the size of abstract games that we are required to solve. We demonstrate the power of the approach experimentally in both small and large games, while also providing a theoretical justification for the resulting improvement. 1
Reference: text
sentIndex sentText sentNum sentScore
1 A common approach for computing strategies in these large games is to first employ an abstraction technique to reduce the original game to an abstract game that is of a manageable size. [sent-8, score-0.992]
2 This abstract game is then solved and the resulting strategy is played in the original game. [sent-9, score-0.513]
3 The trend in this competition has been that strategies found in larger abstract games tend to beat strategies found in smaller abstract games. [sent-11, score-0.619]
4 Moreover, it means that techniques for computing strategies in extensive games are a valuable commodity that can be applied in many different domains. [sent-19, score-0.454]
5 The hope is that the abstract game preserves the important strategic structure of the game, and so playing a near equilibrium solution of the abstract game will still perform well in the original game. [sent-26, score-0.69]
6 In poker, employed abstractions include limiting the possible betting sequences, replacing all betting in the first round with a fixed policy [1], and, most commonly, by grouping the cards dealt to each player into buckets based on a strength metric [4, 9]. [sent-27, score-0.855]
7 Because a finer abstraction can rep1 resent players’ information more accurately and provide a more expressive space of strategies, it is generally assumed that a solution to a finer abstraction will produce stronger strategies for the original game than those computed using a coarser abstraction. [sent-29, score-0.868]
8 Although this assumption is in general not true [7], results from the AAAI Computer Poker Competition [10] have shown that it does often hold: near equilibrium strategies with the largest expressive power tend to win the competition. [sent-30, score-0.496]
9 In this paper, we increase the expressive power of computable strategies without increasing the size of game that can be feasibly solved. [sent-31, score-0.515]
10 Unlike previous, subsequently abandoned, attempts to solve independent sub-games [1, 3], the grafting approach uses a base strategy to ensure that the grafts will mesh well as a unit. [sent-33, score-0.856]
11 In fact, we prove that grafted strategies improve on near equilibrium base strategies. [sent-34, score-0.893]
12 We also empirically demonstrate this improvement both in a small poker game as well as limit Texas Hold’em. [sent-35, score-0.493]
13 2 Background Informally, an extensive game is a game tree where a player cannot distinguish between two histories that share the same information set. [sent-36, score-1.021]
14 P (h) is the player who takes an action after the history h. [sent-45, score-0.473]
15 Let Hi be the set of histories where player i chooses the next action. [sent-46, score-0.471]
16 • For each player i ∈ N , a utility function ui that assigns each terminal history a real value. [sent-49, score-0.536]
17 ui (z) is rewarded to player i for reaching terminal history z. [sent-50, score-0.517]
18 • For each player i ∈ N , a partition Ii of Hi with the property that A(h) = A(h ) whenever h and h are in the same member of the partition. [sent-52, score-0.465]
19 Ii is the information partition of player i; a set Ii ∈ Ii is an information set of player i. [sent-53, score-0.883]
20 In this paper, we exclusively focus on two-player zero-sum games with perfect recall, which is a restriction on the information partitions that excludes unrealistic situations where a player is forced to forget her own past information or decisions. [sent-54, score-0.561]
21 To play an extensive game each player specifies a strategy. [sent-55, score-0.764]
22 A strategy determines how a player makes her decisions when confronted with a choice. [sent-56, score-0.665]
23 Definition 2 (Strategy) A strategy for player i, σi , that assigns a probability distribution over A(h) to each h ∈ Hi . [sent-57, score-0.665]
24 We denote Σi as the set of all strategies for player i. [sent-60, score-0.623]
25 Definition 3 (Strategy Profile) A strategy profile in extensive game Γ is a set of strategies, σ = {σ1 , . [sent-61, score-0.556]
26 We let σ−i denote the set strategies for all players except player i. [sent-65, score-0.686]
27 When all players play according to a strategy profile, σ, we can define the expected utility of each player as ui (σ). [sent-67, score-0.827]
28 Similarly, ui (σi , σ−i ) is the expected utility of player i when all other players play according to σ−i and player i plays according to σi . [sent-68, score-0.998]
29 For zero-sum extensive games with perfect recall we can efficiently compute an ε-Nash equilibrium using techniques such as linear programming [5], counterfactual regret minimization [9] and the excessive gap technique [2]. [sent-72, score-0.476]
30 In a zero-sum game we say it is optimal to play any strategy belonging to an equilibrium because this guarantees the equilibrium player the highest expected utility in the worst case. [sent-73, score-1.359]
31 Any deviation from equilibrium by either player can be exploited by a knowledgeable opponent. [sent-74, score-0.626]
32 Many games of interest are far too large to solve directly and abstraction is often employed to reduce the game to one of a more manageable size. [sent-76, score-0.547]
33 The abstract game is solved and the resulting strategy is presumed to be strong in the original game. [sent-77, score-0.487]
34 The null abstraction for player i, is φi = Ii , A . [sent-81, score-0.6]
35 Finally, for any abstraction α, the abstract game, Γα , is the extensive game A I obtained from Γ by replacing Ii with αi and A(h) with αi (h) when P (h) = i, for all i. [sent-83, score-0.491]
36 3 Strategy Grafting Though there is no guarantee that optimal strategies in abstract games are strong in the original game [7], these strategies have empirically been shown to perform well against both other computers [9] and humans [1]. [sent-86, score-0.793]
37 It is simple to show that a strategy space must include at least as good, if not better, strategies than a smaller space that it refines [7]. [sent-89, score-0.452]
38 In poker, when using arbitrary equilibrium strategies that are evaluated in a tournament setting, this intuition empirically holds true. [sent-91, score-0.462]
39 Definition 6 (Dominated Strategy) A dominated strategy for player i is a pure strategy, σi , such that there exists another strategy, σi , where for all opponent strategies σ−i , ui (σi , σ−i ) ≥ ui (σi , σ−i ) (3) and the inequality must hold strictly for at least one opponent strategy. [sent-93, score-1.244]
40 3 This implies that a player can never benefit by playing a dominated strategy. [sent-95, score-0.475]
41 In the abstract game, this combined strategy might become part of an equilibrium and hence the abstract strategy would make occasional mistakes. [sent-97, score-0.702]
42 , Gp } is a grafting partition for player i if 1. [sent-112, score-0.654]
43 Definition 8 (Grafted Strategy) Given a strategy σi ∈ Σi and a grafting partition G for player i. [sent-124, score-0.901]
44 , p}, define Γσi ,j to be an extensive game derived from the original game Γ where for all h ∈ Hi \ Gj , P (h) = c and fc (a|h) = σi (h, a). [sent-128, score-0.579]
45 That is, player i only controls her actions for histories in Gj and is forced to play according to σi elsewhere. [sent-129, score-0.539]
46 Let the graft of Gj , σ ∗,j , be an ∗ -Nash equilibrium of the game Γσi ,j . [sent-130, score-0.552]
47 Finally, define the grafted strategy for player i σi as, ∗ σi (h, a) = σi (h, a) if h ∈ G0 ∗,j σi (h, a) if h ∈ Gj ∗ We will call σi the base strategy and G the grafting partition for the grafted strategy σi . [sent-131, score-2.263]
48 There are a few key ideas to observe about grafted strategies that distinguish them from previous sub-game decomposition methods. [sent-132, score-0.612]
49 Second, when we construct a graft, only the portion of the game that the graft plays is allowed to vary for our player of interest. [sent-136, score-0.762]
50 Third, note that when we construct a graft, we continue to use an equilibrium finding technique, but we are not interested in the pair of strategies — we are only interested in the strategy for the player of interest. [sent-139, score-1.078]
51 This means in games like poker, where we are interested in a strategy for both players, we must construct a grafted strategy separately for each player. [sent-140, score-1.025]
52 By letting our opponent’s strategy vary completely, our graft will be a strategy that is less prone to exploitation, forcing each individual graft to mesh well with the base strategy and in turn with each other graft when combined. [sent-142, score-1.228]
53 Strategy grafting allows us to construct a strategy with more expressive power that what can be computed by solving a single game. [sent-143, score-0.495]
54 We now show that strategy grafting uses this expressive power to its advantage, causing an (approximate) improvement over its base strategy. [sent-144, score-0.606]
55 4 ∗ Theorem 1 For strategies σ1 , σ2 where σ2 is an -best response to σ1 , if σ1 is the grafted strategy for player 1 where σ1 is used as the base strategy and G is the grafting partition then, p ∗,j u1 (σ1 , σ2 ) − u1 (σ1 , σ2 ) ≥ −3p . [sent-146, score-1.833]
56 ∗ u1 (σ1 , σ2 ) − u1 (σ1 , σ2 ) = j=1 In other words, the grafted strategy’s improvement against σ2 is equal to the sum of the gains of the individual grafts against σ2 and this gain is no less than −3p . [sent-147, score-0.706]
57 Corollary 1 Let α be an abstraction where α2 = φ2 and σ be an -Nash equilibrium strategy for ∗ the game Γα , then any grafted strategy σ1 in Γ with σ1 used as the base strategy will be at most 3p worse than σ1 against σ2 . [sent-159, score-1.833]
58 5 Although these results suggest that a grafted strategy will (approximately) improve on its base strategy against an optimal opponent, there is one caveat: it assumes we know the opponent’s abstraction or can solve a game with the opponent unabstracted. [sent-160, score-1.48]
59 We begin our experiments in a smaller poker game called Leduc Hold’em where we can examine several grafted strategies. [sent-168, score-0.843]
60 This is followed by analysis of a grafted strategy for two-player limit Texas Hold’em that was submitted to the 2009 AAAI Poker Competition. [sent-169, score-0.654]
61 1 Leduc Hold’em Leduc Hold’em is a two player poker game. [sent-171, score-0.651]
62 At the beginning of a hand, each player pays a one chip ante to the pot and receives one private card. [sent-173, score-0.526]
63 A round of betting then takes place starting with player one. [sent-174, score-0.581]
64 Another round of betting occurs after the flop, again starting with player one, and then a showdown takes place. [sent-177, score-0.603]
65 At a showdown, if either player has paired their private card with the public card they win all the chips in the pot. [sent-178, score-0.669]
66 In the event neither player pairs, the player with the higher card is declared the winner. [sent-179, score-0.875]
67 When betting the player adds chips into the pot and action moves to the other player. [sent-183, score-0.699]
68 When folding, a player forfeits the hand and all the money in the pot is awarded to the opposing player. [sent-185, score-0.493]
69 When calling, a player places enough chips into the pot to match the bet faced and the betting round is concluded. [sent-186, score-0.772]
70 When raising, the player must put more chips into the pot than the current bet faced and action moves to the opposing player. [sent-187, score-0.632]
71 If the first player checks initially, the second player may check to conclude the betting round or bet. [sent-188, score-0.999]
72 Each of these strategies will then be used as a base strategy to create two grafted strategies. [sent-195, score-0.932]
73 A strategy is said to beat another strategy if its expected winnings against the other is positive. [sent-197, score-0.571]
74 Unlike the AAAI Poker Competition, in our smaller game we can feasibly compute the expected value of one strategy against another and thus we are not required to sample. [sent-198, score-0.498]
75 We chose to train two types of grafted strategies: preflop grafts and flop grafts. [sent-209, score-0.687]
76 1 Table 1: Expected winnings of the row player against the column player in millibets per hand (mb/h) Strategy J. [sent-305, score-0.935]
77 These two types differ in that the preflop grafts play for the entire game whereas the flop grafts only play the game after the flop. [sent-327, score-1.116]
78 , the final grafted strategy is always using the probabilities from some graft and never the base strategy. [sent-330, score-0.849]
79 For flop grafts, the grafted strategy follows the base strategy in all preflop information sets. [sent-331, score-0.974]
80 Each base strategy and graft is trained using counterfactual regret minimization for one billion iterations. [sent-333, score-0.48]
81 The equilibria found are ε-Nash equilibria where no player can benefit more than ε = 10−5 chips by deviating within the abstract game. [sent-334, score-0.554]
82 We can see in Table 1 that the grafted strategies perform well in a field of equilibriumlike strategies. [sent-339, score-0.615]
83 The base strategy seems to be of great importance when training a grafted strategy. [sent-340, score-0.727]
84 Similarly, the grafted strategies appear to maintain the ordering of their base strategies either when considering the expected winnings in Table 1 or the number of wins in Table 2 (though JQ. [sent-345, score-0.972]
85 Although the choice of base strategy is important, the grafted strategies do well under both evaluation criteria and even the worst base strategy sees great relative improvement when used to train grafted strategies. [sent-348, score-1.678]
86 K is the biggest equilibrium strategy and it performs the best out of this group. [sent-354, score-0.473]
87 Finally, observe that the grafted strategies can have worse exploitability in the original game than their corresponding base strategy. [sent-357, score-0.964]
88 Although this can make grafted strategies more vulnerable to exploitive strategies, they appear to perform well against a field of equilibrium-like opponents. [sent-358, score-0.593]
89 In fact, in our experiment, grafted strategies appear to only improve upon the base strategy despite not always knowing the opponent’s abstraction. [sent-359, score-0.932]
90 Contrast the grafted strategies with the strategy that always folds, which is exploitable at 500 mb/h. [sent-361, score-0.859]
91 Although always folding is less exploitable than some of the grafted strategies, it cannot win against any opponent and would place last in this tournament. [sent-362, score-0.552]
92 0 Table 3: Sampled expected winnings in Texas Hold’em of the row player against the column player in millibets per hand (mb/h). [sent-406, score-0.935]
93 Table 3 shows the results of running this large grafted strategy against equilibrium-like strategies using a variety of abstractions. [sent-414, score-0.84]
94 The 20x32 strategy is the largest single imperfect recall abstract game solved to date. [sent-415, score-0.528]
95 As evident in the results, the grafted strategy beats all of the players with statistical significance, even the largest single strategy. [sent-421, score-0.698]
96 In addition to these results against other Computer Poker Research Group strategies, the grafted strategy also performed well at the 2009 AAAI Computer Poker Competition. [sent-422, score-0.635]
97 Any improvement to the quality of a base strategy should in turn improve the quality of the grafted strategy in similar tournament settings. [sent-425, score-1.042]
98 By creating larger strategies we hope to play fewer dominated strategies and, in turn, make fewer mistakes. [sent-430, score-0.484]
99 It is likely that much of the strength of these new strategies will be bounded by the quality of the base strategy used. [sent-433, score-0.544]
100 2 This particular grafted strategy was computed on a large cluster using 640 processors over almost 6 days. [sent-437, score-0.635]
wordName wordTfidf (topN-words)
[('player', 0.418), ('grafted', 0.388), ('grafts', 0.299), ('strategy', 0.247), ('poker', 0.233), ('game', 0.222), ('equilibrium', 0.208), ('strategies', 0.205), ('grafting', 0.189), ('op', 0.182), ('abstraction', 0.182), ('games', 0.143), ('betting', 0.122), ('graft', 0.122), ('abstractions', 0.107), ('opponent', 0.102), ('pre', 0.102), ('base', 0.092), ('extensive', 0.087), ('chips', 0.078), ('leduc', 0.078), ('texas', 0.065), ('players', 0.063), ('expressive', 0.059), ('imperfect', 0.059), ('pot', 0.058), ('em', 0.056), ('bet', 0.055), ('winnings', 0.055), ('nash', 0.055), ('pr', 0.054), ('histories', 0.053), ('aaai', 0.051), ('private', 0.05), ('gj', 0.049), ('tournament', 0.049), ('hold', 0.047), ('partition', 0.047), ('cards', 0.045), ('millibets', 0.044), ('competition', 0.044), ('ui', 0.043), ('round', 0.041), ('card', 0.039), ('exploitability', 0.039), ('waugh', 0.039), ('competitions', 0.039), ('play', 0.037), ('hi', 0.037), ('dominated', 0.037), ('tractably', 0.036), ('pro', 0.035), ('history', 0.032), ('actions', 0.031), ('zj', 0.03), ('fc', 0.03), ('gilpin', 0.029), ('tuomas', 0.029), ('feasibly', 0.029), ('mesh', 0.029), ('equilibria', 0.029), ('alberta', 0.027), ('wins', 0.027), ('deck', 0.027), ('played', 0.026), ('zinkevich', 0.025), ('bowling', 0.025), ('win', 0.024), ('terminal', 0.024), ('michael', 0.023), ('le', 0.023), ('action', 0.023), ('beat', 0.022), ('bets', 0.022), ('equilibriumlike', 0.022), ('schnizlein', 0.022), ('showdown', 0.022), ('unabstracted', 0.022), ('martin', 0.021), ('public', 0.021), ('sized', 0.021), ('playing', 0.02), ('ner', 0.02), ('kevin', 0.02), ('distinguish', 0.019), ('counterfactual', 0.019), ('johanson', 0.019), ('duane', 0.019), ('exploitable', 0.019), ('folding', 0.019), ('improvement', 0.019), ('utility', 0.019), ('ii', 0.019), ('limit', 0.019), ('techniques', 0.019), ('agents', 0.018), ('biggest', 0.018), ('original', 0.018), ('money', 0.017), ('eighth', 0.017)]
simIndex simValue paperId paperTitle
same-paper 1 1.0000005 232 nips-2009-Strategy Grafting in Extensive Games
Author: Kevin Waugh, Nolan Bard, Michael Bowling
Abstract: Extensive games are often used to model the interactions of multiple agents within an environment. Much recent work has focused on increasing the size of an extensive game that can be feasibly solved. Despite these improvements, many interesting games are still too large for such techniques. A common approach for computing strategies in these large games is to first employ an abstraction technique to reduce the original game to an abstract game that is of a manageable size. This abstract game is then solved and the resulting strategy is played in the original game. Most top programs in recent AAAI Computer Poker Competitions use this approach. The trend in this competition has been that strategies found in larger abstract games tend to beat strategies found in smaller abstract games. These larger abstract games have more expressive strategy spaces and therefore contain better strategies. In this paper we present a new method for computing strategies in large games. This method allows us to compute more expressive strategies without increasing the size of abstract games that we are required to solve. We demonstrate the power of the approach experimentally in both small and large games, while also providing a theoretical justification for the resulting improvement. 1
2 0.34254947 221 nips-2009-Solving Stochastic Games
Author: Liam M. Dermed, Charles L. Isbell
Abstract: Solving multi-agent reinforcement learning problems has proven difficult because of the lack of tractable algorithms. We provide the first approximation algorithm which solves stochastic games with cheap-talk to within absolute error of the optimal game-theoretic solution, in time polynomial in 1/ . Our algorithm extends Murray’s and Gordon’s (2007) modified Bellman equation which determines the set of all possible achievable utilities; this provides us a truly general framework for multi-agent learning. Further, we empirically validate our algorithm and find the computational cost to be orders of magnitude less than what the theory predicts. 1
3 0.32193363 156 nips-2009-Monte Carlo Sampling for Regret Minimization in Extensive Games
Author: Marc Lanctot, Kevin Waugh, Martin Zinkevich, Michael Bowling
Abstract: Sequential decision-making with multiple agents and imperfect information is commonly modeled as an extensive game. One efficient method for computing Nash equilibria in large, zero-sum, imperfect information games is counterfactual regret minimization (CFR). In the domain of poker, CFR has proven effective, particularly when using a domain-specific augmentation involving chance outcome sampling. In this paper, we describe a general family of domain-independent CFR sample-based algorithms called Monte Carlo counterfactual regret minimization (MCCFR) of which the original and poker-specific versions are special cases. We start by showing that MCCFR performs the same regret updates as CFR on expectation. Then, we introduce two sampling schemes: outcome sampling and external sampling, showing that both have bounded overall regret with high probability. Thus, they can compute an approximate equilibrium using self-play. Finally, we prove a new tighter bound on the regret for the original CFR algorithm and relate this new bound to MCCFR’s bounds. We show empirically that, although the sample-based algorithms require more iterations, their lower cost per iteration can lead to dramatically faster convergence in various games. 1
4 0.23928241 161 nips-2009-Nash Equilibria of Static Prediction Games
Author: Michael Brückner, Tobias Scheffer
Abstract: The standard assumption of identically distributed training and test data is violated when an adversary can exercise some control over the generation of the test data. In a prediction game, a learner produces a predictive model while an adversary may alter the distribution of input data. We study single-shot prediction games in which the cost functions of learner and adversary are not necessarily antagonistic. We identify conditions under which the prediction game has a unique Nash equilibrium, and derive algorithms that will find the equilibrial prediction models. In a case study, we explore properties of Nash-equilibrial prediction models for email spam filtering empirically. 1
5 0.13178295 9 nips-2009-A Game-Theoretic Approach to Hypergraph Clustering
Author: Samuel R. Bulò, Marcello Pelillo
Abstract: Hypergraph clustering refers to the process of extracting maximally coherent groups from a set of objects using high-order (rather than pairwise) similarities. Traditional approaches to this problem are based on the idea of partitioning the input data into a user-defined number of classes, thereby obtaining the clusters as a by-product of the partitioning process. In this paper, we provide a radically different perspective to the problem. In contrast to the classical approach, we attempt to provide a meaningful formalization of the very notion of a cluster and we show that game theory offers an attractive and unexplored perspective that serves well our purpose. Specifically, we show that the hypergraph clustering problem can be naturally cast into a non-cooperative multi-player “clustering game”, whereby the notion of a cluster is equivalent to a classical game-theoretic equilibrium concept. From the computational viewpoint, we show that the problem of finding the equilibria of our clustering game is equivalent to locally optimizing a polynomial function over the standard simplex, and we provide a discrete-time dynamics to perform this optimization. Experiments are presented which show the superiority of our approach over state-of-the-art hypergraph clustering techniques.
6 0.082333654 48 nips-2009-Bootstrapping from Game Tree Search
7 0.045198198 14 nips-2009-A Parameter-free Hedging Algorithm
8 0.040933993 21 nips-2009-Abstraction and Relational learning
9 0.040357769 40 nips-2009-Bayesian Nonparametric Models on Decomposable Graphs
10 0.038552161 53 nips-2009-Complexity of Decentralized Control: Special Cases
11 0.036229204 113 nips-2009-Improving Existing Fault Recovery Policies
12 0.035250492 129 nips-2009-Learning a Small Mixture of Trees
13 0.033619337 61 nips-2009-Convex Relaxation of Mixture Regression with Efficient Algorithms
14 0.028543003 218 nips-2009-Skill Discovery in Continuous Reinforcement Learning Domains using Skill Chaining
15 0.028311769 84 nips-2009-Evaluating multi-class learning strategies in a generative hierarchical framework for object detection
16 0.028281074 220 nips-2009-Slow Learners are Fast
17 0.027777696 11 nips-2009-A General Projection Property for Distribution Families
18 0.026677303 210 nips-2009-STDP enables spiking neurons to detect hidden causes of their inputs
19 0.026652925 102 nips-2009-Graph-based Consensus Maximization among Multiple Supervised and Unsupervised Models
20 0.026111685 234 nips-2009-Streaming k-means approximation
topicId topicWeight
[(0, -0.091), (1, 0.062), (2, 0.105), (3, -0.153), (4, -0.16), (5, 0.058), (6, -0.047), (7, -0.022), (8, 0.056), (9, -0.001), (10, -0.278), (11, -0.425), (12, -0.216), (13, -0.125), (14, -0.043), (15, -0.257), (16, -0.047), (17, 0.015), (18, 0.025), (19, -0.022), (20, 0.049), (21, 0.039), (22, 0.016), (23, 0.031), (24, 0.061), (25, 0.055), (26, 0.032), (27, 0.002), (28, -0.018), (29, 0.056), (30, 0.001), (31, 0.048), (32, -0.014), (33, -0.019), (34, -0.002), (35, -0.032), (36, 0.01), (37, 0.017), (38, -0.018), (39, -0.002), (40, 0.005), (41, -0.024), (42, 0.002), (43, 0.017), (44, 0.014), (45, 0.049), (46, -0.007), (47, 0.056), (48, -0.036), (49, 0.01)]
simIndex simValue paperId paperTitle
same-paper 1 0.98891634 232 nips-2009-Strategy Grafting in Extensive Games
Author: Kevin Waugh, Nolan Bard, Michael Bowling
Abstract: Extensive games are often used to model the interactions of multiple agents within an environment. Much recent work has focused on increasing the size of an extensive game that can be feasibly solved. Despite these improvements, many interesting games are still too large for such techniques. A common approach for computing strategies in these large games is to first employ an abstraction technique to reduce the original game to an abstract game that is of a manageable size. This abstract game is then solved and the resulting strategy is played in the original game. Most top programs in recent AAAI Computer Poker Competitions use this approach. The trend in this competition has been that strategies found in larger abstract games tend to beat strategies found in smaller abstract games. These larger abstract games have more expressive strategy spaces and therefore contain better strategies. In this paper we present a new method for computing strategies in large games. This method allows us to compute more expressive strategies without increasing the size of abstract games that we are required to solve. We demonstrate the power of the approach experimentally in both small and large games, while also providing a theoretical justification for the resulting improvement. 1
2 0.86316681 156 nips-2009-Monte Carlo Sampling for Regret Minimization in Extensive Games
Author: Marc Lanctot, Kevin Waugh, Martin Zinkevich, Michael Bowling
Abstract: Sequential decision-making with multiple agents and imperfect information is commonly modeled as an extensive game. One efficient method for computing Nash equilibria in large, zero-sum, imperfect information games is counterfactual regret minimization (CFR). In the domain of poker, CFR has proven effective, particularly when using a domain-specific augmentation involving chance outcome sampling. In this paper, we describe a general family of domain-independent CFR sample-based algorithms called Monte Carlo counterfactual regret minimization (MCCFR) of which the original and poker-specific versions are special cases. We start by showing that MCCFR performs the same regret updates as CFR on expectation. Then, we introduce two sampling schemes: outcome sampling and external sampling, showing that both have bounded overall regret with high probability. Thus, they can compute an approximate equilibrium using self-play. Finally, we prove a new tighter bound on the regret for the original CFR algorithm and relate this new bound to MCCFR’s bounds. We show empirically that, although the sample-based algorithms require more iterations, their lower cost per iteration can lead to dramatically faster convergence in various games. 1
3 0.8308388 221 nips-2009-Solving Stochastic Games
Author: Liam M. Dermed, Charles L. Isbell
Abstract: Solving multi-agent reinforcement learning problems has proven difficult because of the lack of tractable algorithms. We provide the first approximation algorithm which solves stochastic games with cheap-talk to within absolute error of the optimal game-theoretic solution, in time polynomial in 1/ . Our algorithm extends Murray’s and Gordon’s (2007) modified Bellman equation which determines the set of all possible achievable utilities; this provides us a truly general framework for multi-agent learning. Further, we empirically validate our algorithm and find the computational cost to be orders of magnitude less than what the theory predicts. 1
4 0.80914152 161 nips-2009-Nash Equilibria of Static Prediction Games
Author: Michael Brückner, Tobias Scheffer
Abstract: The standard assumption of identically distributed training and test data is violated when an adversary can exercise some control over the generation of the test data. In a prediction game, a learner produces a predictive model while an adversary may alter the distribution of input data. We study single-shot prediction games in which the cost functions of learner and adversary are not necessarily antagonistic. We identify conditions under which the prediction game has a unique Nash equilibrium, and derive algorithms that will find the equilibrial prediction models. In a case study, we explore properties of Nash-equilibrial prediction models for email spam filtering empirically. 1
5 0.52811778 9 nips-2009-A Game-Theoretic Approach to Hypergraph Clustering
Author: Samuel R. Bulò, Marcello Pelillo
Abstract: Hypergraph clustering refers to the process of extracting maximally coherent groups from a set of objects using high-order (rather than pairwise) similarities. Traditional approaches to this problem are based on the idea of partitioning the input data into a user-defined number of classes, thereby obtaining the clusters as a by-product of the partitioning process. In this paper, we provide a radically different perspective to the problem. In contrast to the classical approach, we attempt to provide a meaningful formalization of the very notion of a cluster and we show that game theory offers an attractive and unexplored perspective that serves well our purpose. Specifically, we show that the hypergraph clustering problem can be naturally cast into a non-cooperative multi-player “clustering game”, whereby the notion of a cluster is equivalent to a classical game-theoretic equilibrium concept. From the computational viewpoint, we show that the problem of finding the equilibria of our clustering game is equivalent to locally optimizing a polynomial function over the standard simplex, and we provide a discrete-time dynamics to perform this optimization. Experiments are presented which show the superiority of our approach over state-of-the-art hypergraph clustering techniques.
6 0.42623794 48 nips-2009-Bootstrapping from Game Tree Search
7 0.26741379 14 nips-2009-A Parameter-free Hedging Algorithm
8 0.1545309 60 nips-2009-Convergent Temporal-Difference Learning with Arbitrary Smooth Function Approximation
9 0.15121371 234 nips-2009-Streaming k-means approximation
10 0.13505273 220 nips-2009-Slow Learners are Fast
11 0.13256666 249 nips-2009-Tracking Dynamic Sources of Malicious Activity at Internet Scale
12 0.13158055 16 nips-2009-A Smoothed Approximate Linear Program
13 0.13151549 25 nips-2009-Adaptive Design Optimization in Experiments with People
14 0.12996565 244 nips-2009-The Wisdom of Crowds in the Recollection of Order Information
15 0.12266718 113 nips-2009-Improving Existing Fault Recovery Policies
16 0.12017614 180 nips-2009-On the Convergence of the Concave-Convex Procedure
17 0.11985466 194 nips-2009-Predicting the Optimal Spacing of Study: A Multiscale Context Model of Memory
18 0.11492871 76 nips-2009-Efficient Learning using Forward-Backward Splitting
19 0.11284534 210 nips-2009-STDP enables spiking neurons to detect hidden causes of their inputs
20 0.11201782 176 nips-2009-On Invariance in Hierarchical Models
topicId topicWeight
[(24, 0.139), (25, 0.042), (35, 0.037), (36, 0.078), (39, 0.042), (49, 0.025), (57, 0.343), (58, 0.036), (61, 0.019), (71, 0.082), (86, 0.038), (91, 0.016)]
simIndex simValue paperId paperTitle
same-paper 1 0.84402406 232 nips-2009-Strategy Grafting in Extensive Games
Author: Kevin Waugh, Nolan Bard, Michael Bowling
Abstract: Extensive games are often used to model the interactions of multiple agents within an environment. Much recent work has focused on increasing the size of an extensive game that can be feasibly solved. Despite these improvements, many interesting games are still too large for such techniques. A common approach for computing strategies in these large games is to first employ an abstraction technique to reduce the original game to an abstract game that is of a manageable size. This abstract game is then solved and the resulting strategy is played in the original game. Most top programs in recent AAAI Computer Poker Competitions use this approach. The trend in this competition has been that strategies found in larger abstract games tend to beat strategies found in smaller abstract games. These larger abstract games have more expressive strategy spaces and therefore contain better strategies. In this paper we present a new method for computing strategies in large games. This method allows us to compute more expressive strategies without increasing the size of abstract games that we are required to solve. We demonstrate the power of the approach experimentally in both small and large games, while also providing a theoretical justification for the resulting improvement. 1
2 0.74775237 88 nips-2009-Extending Phase Mechanism to Differential Motion Opponency for Motion Pop-out
Author: Yicong Meng, Bertram E. Shi
Abstract: We extend the concept of phase tuning, a ubiquitous mechanism among sensory neurons including motion and disparity selective neurons, to the motion contrast detection. We demonstrate that the motion contrast can be detected by phase shifts between motion neuronal responses in different spatial regions. By constructing the differential motion opponency in response to motions in two different spatial regions, varying motion contrasts can be detected, where similar motion is detected by zero phase shifts and differences in motion by non-zero phase shifts. The model can exhibit either enhancement or suppression of responses by either different or similar motion in the surrounding. A primary advantage of the model is that the responses are selective to relative motion instead of absolute motion, which could model neurons found in neurophysiological experiments responsible for motion pop-out detection. 1 In trod u ction Motion discontinuity or motion contrast is an important cue for the pop-out of salient moving objects from contextual backgrounds. Although the neural mechanism underlying the motion pop-out detection is still unknown, the center-surround receptive field (RF) organization is considered as a physiological basis responsible for the pop-out detection. The center-surround RF structure is simple and ubiquitous in cortical cells especially in neurons processing motion and color information. Nakayama and Loomis [1] have predicted the existence of motion selective neurons with antagonistic center-surround receptive field organization in 1974. Recent physiological experiments [2][3] show that neurons with center-surround RFs have been found in both middle temporal (MT) and medial superior temporal (MST) areas related to motion processing. This antagonistic mechanism has been suggested to detect motion segmentation [4], figure/ground segregation [5] and the differentiation of object motion from ego-motion [6]. There are many related works [7]-[12] on motion pop-out detection. Some works [7]-[9] are based on spatio-temporal filtering outputs, but motion neurons are not fully interacted by either only inhibiting similar motion [7] or only enhancing opposite motion [8]. Heeger, et al. [7] proposed a center-surround operator to eliminate the response dependence upon rotational motions. But the Heeger's model only shows a complete center-surround interaction for moving directions. With respect to the surrounding speed effects, the neuronal responses are suppressed by the same speed with the center motion but not enhanced by other speeds. Similar problem existed in [8], which only modeled the suppression of neuronal responses in the classical receptive field (CRF) by similar motions in surrounding regions. Physiological experiments [10][11] show that many neurons in visual cortex are sensitive to the motion contrast rather than depend upon the absolute direction and speed of the object motion. Although pooling over motion neurons tuned to different velocities can eliminate the dependence upon absolute velocities, it is computationally inefficient and still can't give full interactions of both suppression and enhancement by similar and opposite surrounding motions. The model proposed by Dellen, et al. [12] computed differential motion responses directly from complex cells in V1 and didn't utilize responses from direction selective neurons. In this paper, we propose an opponency model which directly responds to differential motions by utilizing the phase shift mechanism. Phase tuning is a ubiquitous mechanism in sensory information processing, including motion, disparity and depth detection. Disparity selective neurons in the visual cortex have been found to detect disparities by adjusting the phase shift between the receptive field organizations in the left and right eyes [13][14]. Motion sensitive cells have been modeled in the similar way as the disparity energy neurons and detect image motions by utilizing the phase shift between the real and imaginary parts of temporal complex valued responses, which are comparable to images to the left and right eyes [15]. Therefore, the differential motion can be modeled by exploring the similarity between images from different spatial regions and from different eyes. The remainder of this paper is organized as following. Section 2 illustrates the phase shift motion energy neurons which estimate image velocities by the phase tuning in the imaginary path of the temporal receptive field responses. In section 3, we extend the concept of phase tuning to the construction of differential motion opponency. The phase difference determines the preferred velocity difference between adjacent areas in retinal images. Section 4 investigates properties of motion pop-out detection by the proposed motion opponency model. Finally, in section 5, we relate our proposed model to the neural mechanism of motion integration and motion segmentation in motion related areas and suggest a possible interpretation for adaptive center-surround interactions observed in biological experiments. 2 Phase Shift Motion Energy Neurons Adelson and Bergen [16] proposed the motion energy model for visual motion perception by measuring spatio-temporal orientations of image sequences in space and time. The motion energy model posits that the responses of direction-selective V1 complex cells can be computed by a combination of two linear spatio-temporal filtering stages, followed by squaring and summation. The motion energy model was extended in [15] to be phase tuned by splitting the complex valued temporal responses into real and imaginary paths and adding a phase shift on the imaginary path. Figure 1(a) demonstrates the schematic diagram of the phase shift motion energy model. Here we assume an input image sequence in two-dimensional space (x, y) and time t. The separable spatio-temporal receptive field ensures the cascade implementation of RF with spatial and temporal filters. Due to the requirement of the causal temporal RF, the phase shift motion energy model didn’t adopt the Gabor filter like the spatial RF. The phase shift spatio-temporal RF is modeled with a complex valued function f ( x, y, t ) = g ( x, y ) ⋅ h ( t , Φ ) , where the spatial and temporal RFs are denoted by g ( x, y ) and h ( t , Φ ) respectively, g ( x, y ) = N ( x, y | 0, C ) exp ( jΩ x x + jΩ y y ) h ( t , Φ ) = hreal ( t ) + exp ( jΦ ) himag ( t ) (1) and C is the covariance matrix of the spatial Gaussian envelope and Φ is the phase tuning of the motion energy neuron. The real and imaginary profiles of the temporal receptive field are Gamma modulated sinusoidal functions with quadrature phases, hreal ( t ) = G ( t | α ,τ ) cos ( Ωt t ) (2) himag ( t ) = G ( t | α ,τ ) sin ( Ωt t ) The envelopes for complex exponentials are functions of Gaussian and Gamma distributions, N ( x, y | 0, C ) = ⎛ x2 y2 exp ⎜ − 2 − 2 ⎜ 2σ x 2σ y 2πσ xσ y ⎝ 1 ⎞ ⎟ ⎟ ⎠ (3) hreal (t ) g ( x, y ) himag (t ) g ( x, y ) (·)2 (·)2 M M M (·)2 (·)2 M M M 2 (·) 2 (·) Vreal V (Φ ) e jΦ Vimag (a) Ev ( Φ max ) (·)2 wc ( x, y ) e jΦmin Ev ( (b) M 0 ) w ( x, y ) c M Ev ( Φ min ) M EΔv ( Θ ) ∫∫∫ K x , y ,Φ e j0 e jΦmin ws ( x, y ) Ks c e ∫∫∫ jΘ ws ( x, y ) e j0 x , y ,Φ ws ( x, y ) wc ( x, y ) M e jΦ max e jΦmax (·)2 M (·)2 M M (·)2 M 2 (·) M M (·)2 (c) Figure 1. (a) shows the diagram of the phase shift motion energy model adapted from [15]. (b) draws the spatiotemporal representation of the phase shift motion energy neuron with the real and imaginary receptive field demonstrated by the two left pictures. (c) illustrates the construction of differential motion opponency with a phase difference Θ from two populations of phase shift motion energy neurons in two spatial areas c and s. To avoid clutter, the space location (x, y) is not explicitly shown in phase tuned motion energies. G (t | α ,τ ) = 1 ⎛ t t α −1 exp ⎜ − Γ(α )τ α ⎝ τ ⎞ ⎟ u (t ) ⎠ (4) where Γ (α ) is the gamma function and u ( t ) is the unit step function. The parameters α and τ determine the temporal RF size. As derived in [15], the motion energy at location (x, y) can be computed by E v ( x, y, Φ ) = S + P cos ( Ψ − Φ ) (5) where S = Vreal 2 + Vimag 2 * P = 2 VrealVimag ( * Ψ = arg VrealVimag (6) ) and complex valued responses in real and imaginary paths are obtained as, Vreal ( x, y, t ) = ∫∫∫ g (ξ , ζ ) h (η ) I ( x − ξ , y − ζ , t − η ) dξ dζ dη real ξ ,ζ ,η Vimag ( x, y, t ) = ∫∫∫ g (ξ , ζ ) h (η ) I ( x − ξ , y − ζ , t − η ) dξ dζ dη ξ ζ η (7) imag , , The superscript * represents the complex conjugation and the phase shift parameter Φ controls the spatio-temporal orientation tuning. To avoid clutter, the spatial location variables x and y for S, P, Ψ, Vreal and Vimag are not explicitly shown in Eq. (5) and (6). Figure 1(b) demonstrates the even and odd profiles of the spatio-temporal RF tuned to a particular phase shift. Θ 0 Θ 0 (a) (b) Figure 2. Two types of differential motion opponency constructions of (a) center-surrounding interaction and (b) left-right interaction. Among cells in area MT with surrounding modulations, 25% of cells are with the antagonistic RF structure in the top row and another 50% of cells have the integrative RF structure as shown in the bottom row. 3 Extending Phase Op p on ency Mechanism to D i f f e r e nt i a l Motion Based on the above phase shift motion energy model, the local image velocity at each spatial location can be represented by a phase shift which leads to the peak response across a population of motion energy neurons. Across regions of different motions, there are clear discontinuities on the estimated velocity map. The motion discontinuities can be detected by edge detectors on the velocity map to segment different motions. However, this algorithm for motion discontinuities detection can’t discriminate between the object motion and uniform motions in contextual backgrounds. Here we propose a phase mechanism to detect differential motions inspired by the disparity energy model and adopt the center-surround inhibition mechanism to pop out the object motion from contextual background motions. The motion differences between different spatial locations can be modeled in the similar way as the disparity model. The motion energies from two neighboring locations are considered as the retinal images to the left and right eyes. Thus, we can construct a differential motion opponency by placing two populations of phase shift motion energy neurons at different spatial locations and the energy EΔv ( Θ ) of the opponency is the squared modulus of the averaged phase shift motion energies over space and phase, E Δv ( Θ ) = ∫∫∫ E ( x, y, Φ ) ⋅ w ( x, y, Φ | Θ ) dxdyd Φ v 2 (8) where w ( x, y, Θ ) is the profile for differential motion opponency and Δv is the velocity difference between the two spatial regions defined by the kernel w ( x, y, Θ ) . Since w ( x, y, Θ ) is intended to implement the functional role of spatial interactions, it is desired to be a separable function in space and phase domain and can be modeled by phase tuned summation of two spatial kernels, w ( x, y, Φ | Θ ) = wc ( x, y ) e jΦ + e jΘ+ jΦ ws ( x, y ) (9) where wc ( x, y ) and ws ( x, y ) are Gaussian kernels of different spatial sizes σ c and σ s , and Θ is the phase difference representing velocity difference between two spatial regions c and s. Substituting Eq. (9) into Eq. (8), the differential motion energy can be reformulated as EΔv ( Θ ) = K c + e jΘ K s 2 (10) 3 3 3 2 2 2 1 1 1 0 0 -1 -1 -2 -2 -2 -3 -3 -3 -3 -3 1 Right Velocity Right Velocity 0.98 0.96 0.94 0.92 0 0.9 0.88 -1 0.86 0.84 0.82 -2 -1 0 1 Left Velocity 2 3 -2 -1 0 1 Left Velocity 2 3 0.8 (a) (b) Figure 3. (a) Phase map and (b) peak magnitude map are obtained from stimuli of two patches of random dots moving with different velocities. The two patches of stimuli are statistically independent but share the same spatial properties: dot size of 2 pixels, dot density of 10% and dot coherence level of 100%. The phase tuned population of motion energy neurons are applied to each patch of random dots with RF parameters: Ωt = 2π/8, Ωt = 2π/16, σx = 5 and τ = 5.5. For each combination of velocities from left and right patches, averaged phase shifts over space and time are computed and so do the magnitudes of peak responses. The unit for velocities is pixels per frame. where Kc = ∫∫∫ E ( x, y, Φ ) exp ( jΦ ) w ( x, y ) dxdyd Φ v,c c x , y ,Φ Ks = ∫∫∫ E ( x, y, Φ ) exp ( jΦ ) w ( x, y ) dxdyd Φ v,s (11) s x, y ,Φ Ev ,c ( x, y, Φ ) and Ev , s ( x, y, Φ ) are phase shift motion energies at location (x, y) and with phase shift Φ. Utilizing the results in Eq. (5) and (6), Eq. (10) and (11) generate similar results, E Δv ( Θ ) = Sopp + Popp cos ( Θopp − Θ ) (12) where Sopp = K c 2 + Ks Popp = 2 K c K s* 2 (13) Θopp = arg ( K c K s* ) According to above derivations, by varying the phase shift Θ between –π and π, the relative motion energy of the differential motion opponency can be modeled as population responses across a population of phase tuned motion opponencies. The response is completely specified by three parameters Sopp , Popp and Θopp . The schematic diagram of this opponency is illustrated in Figure 1(c). The differential motion opponency is constituted by three stages. At the first stage, a population of phase shift motion energy neurons is applied to be selective to different velocities. At the second stage, motion energies from the first stage are weighted by kernels tuned to different spatial locations and phase shifts respectively for both spatial regions and two single differential motion signals in region c and region s are achieved by integrating responses from these two regions over space and phase tuning. Finally, the differential motion energy is computed by the squared modulus of the summation of the integrated motion signal in region c and phase shifted motion signal in region s. The subscripts c and s represent two interacted spatial regions which are not limited to the center and surround regions. The opponency could also be constructed by the neighboring left and right Inhibitive interaction, Θ = π/2 Excitatory interaction, Θ =0 Inhibitory 2 1.6 Responses 1.6 Responses Excitatory 2 1.2 0.8 1.2 0.8 0.4 0.4 0 0 pi/2 pi 3pi/2 Surrouding Direction 0 0 2pi (a) Model by Petkov et al. [8] pi/2 pi 3pi/2 Surrouding Direction (b) Model by Heeger et al. [7] Inhibitory 2 2pi Inhibitory 2 1.6 1.6 Responses Responses 1.2 0.8 1.2 0.8 0.4 0.4 0 0 0 pi/2 pi Surrouding Direction 3pi/2 2pi 0 pi/2 pi Surrouding Direction 3pi/2 2pi (c) (d) Figure 4. Demonstrations of center-surround differential motion opponency, where (a) show the excitation of opposite directions outside the CRF and (b) show the inhibition by surrounding motions in same directions. The center-surround inhibition models by Petkov, et al. [8] and Heeger, et al. [7] are shown in (c) and (d). Responses above 1 indicate enhancement and responses below 1 indicate suppressions. spatial regions. Figure 2 shows two types of structures for the differential motion opponency. In [17], the authors demonstrates that among cells in area MT with surrounding modulations, 25% of cells are with the antagonistic RF structure as shown in Figure 2(a) and another 50% of cells have the integrative RF structure as shown in Figure 2(b). The velocity difference tuning of the opponency is determined by the phase shift parameter Θ combined with parameters of spatial and temporal frequencies for motion energy neurons. The larger phase shift magnitude prefers the bigger velocity difference. This phase tuning of velocity difference is consistent with the phase tuning of motion energy neurons. Figure 3 shows the phase map obtained by using random dots stimuli with different velocities on two spatial patches (left and right patches with sizes of 128 pixels 128 pixels). Along the diagonal line, velocities from left and right patches are equal to each other and therefore phase estimates are zeros along this line. Deviated from the diagonal line to upper-left and lower-right, the phase magnitudes increase while positive phases indicate larger left velocities and negative phases indicate larger right velocities. The phase tuning can give a good classification of velocity differences. 4 V a l i d a t i o n o f D i f f e r e n t i a l M o t i o n O pp o n e n c y Out derivation and analysis above show that the phase shift between two neighboring spatial regions is a good indicator for motion difference between these two regions. In this section, we validate the proposed differential motion opponency by two sets of experiments, which show effects of both surrounding directions and speeds on the center motion. Inhibitory 2 1.6 1.2 1.2 Responses 1.6 Responses Inhibitory 2 0.8 0.4 0.4 0 -2 0.8 0 -1.5 -1 -0.5 0 0.5 Center Speed 1 1.5 2 -2 -1.5 -1 -0.5 0 0.5 Center Speed 1 1.5 2 (a) (b) Figure 5. The insensitivity of the proposed opponency model to absolute center and surrounding velocities is demonstrated in (a), where responses are enhanced for all center velocities from -2 to 2 pixels per frame. In (b), the model by Heeger, et al. [7] only shows enhancement when the center speed matches the preferred speed of 1.2 pixel per frame. Similarly, responses above 1 indicate enhancement and below 1 indicate suppressions. In both curves, the velocity differences between center and surrounding regions are maintained as a constant of 3 pixels per frame. Physiological experiments [2][3] have demonstrated that the neuronal activities in the classical receptive field are suppressed by responses outside the CRF to stimuli with similar motions including both directions and speeds on the center and surrounding regions. On the contrary, visual stimuli of opposite directions or quite different speeds outside the CRF enhance the responses in the CRF. In their experiments, they used a set of stimuli of random dots moving at different velocities, where there are small patches of moving random dots on the center. We tested the properties of the proposed opponency model for motion difference measurement by using similar random dots stimuli. The random dots on background move with different speeds and in different direction but have the same statistical parameters: dot size of 2 pixels, dot density of 10% and motion coherence level of 100%. The small random dots patches are placed on the center of background stimuli to stimulate the neurons in the CRF. These small patches share the same statistical parameters with background random dots but move with a constant velocity of 1 pixel per frame. Figure 4 shows results for the enhanced and suppressed responses in the CRF with varying surrounding directions. The phase shift motion energy neurons had the same spatial and temporal frequencies and the same receptive field sizes, and were selective to vertical orientations. The preferred spatial frequency was 2π/16 radian per pixel and the temporal frequency was 2π/16 radian per frame. The sizes of RF in horizontal and vertical directions were respectively 5 pixels and 10 pixels, corresponding to a spatial bandwidth of 1.96 octaves. The time constant τ was 5.5 frames which resulted in a temporal bandwidth of 1.96 octaves. As shown in Figure 4 (a) and (b), the surrounding motion of opposite direction gives the largest response to the motion in the CRF for the inhibitory interaction and the smallest response for the excitatory interaction. Results demonstrated in Figure 4 are consistent with physiological results reported in [3]. In Born’s paper, inhibitory cells show response enhancement and excitatory cells show response suppression when surrounding motions are in opposite directions. The 3-dB bandwidth for the surrounding moving direction is about 135 degrees for the physiological experiments while the bandwidth is about 180 degrees for the simulation results in our proposed model. Models proposed by Petkov, et al. [8] and Heeger, et al. [7] also show clear inhibition between opposite motions. The Petkov’s model achieves the surrounding suppression for each point in ( x, y, t ) space by the subtraction between responses from that point and its surroundings and followed by a half-wave rectification, + % Ev ,θ ( x, y, t ) = Ev ,θ ( x, y, t ) − α ⋅ Sv ,θ ( x, y, t ) (14) where Ev ,θ ( x, y, t ) is the motion energy at location (x,y) and time t for a given preferred speed v and orientation θ, Sv ,θ ( x, y, t ) is the average motion energy in the surrounding of point (x, y, t), % Ev ,θ ( x, y, t ) is the suppressed motion energy and the factor α controls the inhibition strength. The inhibition term is computed by weighted motion energy Sv ,θ ( x, y, t ) = Ev ,θ ( x, y, t ) ∗ wv ,θ ( x, y, t ) (15) where wv ,θ ( x, y, t ) is the surround weighting function. The Heeger’s model constructs the center-surround motion opponent by computing the weighted sum of responses from motion selective cells, Rv ,θ ( t ) = ∑ β ( x, y ) ⎡ Ev ,θ ( x, y, t ) − E− v ,θ ( x, y, t ) ⎤ ⎣ ⎦ (16) x, y where β ( x, y ) is a center-surround weighting function and the motion energy at each point should be normalized across all cells with different tuning properties. As shown in Figure 4 (c) and (d) for results of Petkov’s and Heeger’s models, we replace the conventional frequency tuned motion energy neuron with our proposed phase tuned neuron. The model by Petkov, et al. [8] is generally suppressive and only reproduces less suppression for opposite motions, which is inconsistent with results from [3]. The model by Heeger, et al. [7] has similar properties with our proposed model with respect to both excitatory and inhibitory interactions. To evaluate the sensitivity of the proposed opponency model to velocity differences, we did simulations by using similar stimuli with the above experiment in Figure 4 but maintaining a constant velocity difference of 3 pixels per frame between the center and surrounding random dot patches. As shown in Figure 5, by varying the velocities of random dots on the center region, we found that responses by the proposed model are always enhanced independent upon absolute velocities of center stimuli, but responses by the Heeger’s model achieve the enhancement at a center velocity of 1.2 pixels per frame and maintain suppressed at other speeds. 5 D i s c u s s i on We proposed a new biologically plausible model of the differential motion opponency to model the spatial interaction property of motion energy neurons. The proposed opponency model is motivated by the phase tuning mechanism of disparity energy neurons which infers the disparity information from the phase difference between complex valued responses to left and right retinal images. Hence, the two neighboring spatial areas can be considered as left and right images and the motion difference between these two spatial regions is detected by the phase difference between the complex valued responses at these two regions. Our experimental results demonstrate a consistent conclusion with physiological experiments that motions of opposite directions and different speeds outside the CRF can show both inhibitive and excitatory effects on the CRF responses. The inhibitive interaction helps to segment the moving object from backgrounds when fed back to low-level features such as edges, orientations and color information. Except providing a unifying phase mechanism in understanding neurons with different functional roles at different brain areas, the proposed opponency model could possibly provide a way to understand the motion integration and motion segmentation. Integration and segmentation are two opposite motion perception tasks but co-exist to constitute two fundamental types of motion processing. Segmentation is achieved by discriminating motion signals from different objects, which is thought to be due to the antagonistic interaction between center and surrounding RFs. Integration is obtained by utilizing the enhancing function of surrounding areas to CRF areas. Both types of processing have been found in motion related areas including area MT and MST. Tadin, et al. [18] have found that motion segmentation dominants at high stimulus contrast and gives the way to motion integration at low stimulus contrast. Huang, et al. [19] suggests that the surrounding modulation is adaptive according to the visual stimulus such as contrasts and noise levels. Since our proposed opponency model determines the functional role of neurons by only the phase shift parameter, this makes the proposed model to be an ideal candidate model for the adaptive surrounding modulation with the phase tuning between two spatial regions. References [1]. K. Nakayama and J. M. Loomis, “Optical velocity patterns, velocity-sensitive neurons, and space perception: A hypothesis,” Perception, vol. 3, 63-80, 1974. [2]. K. Tanaka, K. Hikosaka, H. Saito, M. Yukie, Y. Fukada and E. Iwai, “Analysis of local and wide-field movements in the superior temporal visual areas of the macaque monkey,” Journal of Neuroscience, vol. 6, pp. 134-144, 1986. [3]. R. T. Born and R. B. H. Tootell, “Segregation of global and local motion processing in primate middle temporal visual area,” Nature, vol. 357, pp. 497-499, 1992. [4]. J. Allman, F. Miezin and E. McGuinness, “Stimulus specific responses from beyond the classical receptive field: Neurophysiological mechanisms for local-global comparisions in visual neurons,” Annual Review Neuroscience, vol. 8, pp. 407-430, 1985. [5]. V. A. F. Lamme, “The neurophysiology of figure-ground segregation in primary visual cortex,” Journal of Neuroscience, vol. 15, pp. 1605-1615, 1995. [6]. D. C. Bradley and R. A. Andersen, “Center-surround antagonism based on disparity in primate area MT,” Journal of Neuroscience, vol. 18, pp. 7552-65, 1998. [7]. D. J. Heeger, A. D. Jepson and E. P. Simoncelli, “Recovering observer translation with center-surround operators,” Proc IEEE Workshop on Visual Motion, pp. 95-100, Oct 1991. [8]. N. Petkov and E. Subramanian, “Motion detection, noise reduction, texture suppression, and contour enhancement by spatiotemporal Gabor filters with surround inhibition,” Biological Cybernetics, vol. 97, pp. 423-439, 2007. [9]. M. Escobar and P. Kornprobst, “Action recognition with a Bio-inspired feedforward motion processing model: the richness of center-surround interactions,” ECCV '08: Proceedings of the 10th European Conference on Computer Vision, pp. 186-199, Marseille, France, 2008. [10]. B. J. Frost and K. Nakayama, “Single visual neurons code opposing motion independent of direction,” Science, vol. 200, pp. 744-745, 1983. [11]. A. Cao and P. H. Schiller, “Neural responses to relative speed in the primary visual cortex of rhesus monkey,” Visual Neuroscience, vol. 20, pp. 77-84, 2003. [12]. B. K. Dellen, J. W. Clark and R. Wessel, “Computing relative motion with complex cells,” Visual Neuroscience, vol. 22, pp. 225-236, 2005. [13]. I. Ohzawa, G. C. Deangelis and R. D. Freeman, “Encoding of binocular disparity by complex cells in the cat’s visual cortex,” Journal of Neurophysiology, vol. 77, pp. 2879-2909, 1997. [14]. D. J. Fleet, H. Wagner and D. J. Heeger, “Neural Encoding of binocular disparity: energy model, position shifts and phase shifts,” Vision Research, vol. 26, pp. 1839-1857, 1996. [15]. Y. C. Meng and B. E. Shi, “Normalized Phase Shift Motion Energy Neuron Populations for Image Velocity Estimation,” International Joint Conference on Neural Network, Atlanta, GA, June 14-19, 2009. [16]. E. H. Adelson and J. R. Bergen, “Spatiotemporal energy models for the perception of motion,” J. Opt. Soc. Am. A Opt. Image Sci. Vis., vol. 2, pp. 284-299, 1985. [17]. D. K. Xiao, S. Raiguel, V. Marcar, J. Koenderink and G. A. Orban, “The spatial distribution of the antagonistic surround of MT/V5,” Cereb Cortex, vol. 7, pp. 662-677, 1997. [18]. D. Tadin, J. S. Lappin, L. A. Gilroy and R. Blake, “Perceptual consequences of centre-surround antagonism in visual motion processing,” Nature, vol. 424, pp. 312-315, 2003. [19]. X. Huang, T. D. Albright and G. R. Stoner, “Adaptive surround modulation in cortical area MT,” Neuron, vol. 53, pp. 761-770, 2007.
3 0.64980733 138 nips-2009-Learning with Compressible Priors
Author: Volkan Cevher
Abstract: We describe a set of probability distributions, dubbed compressible priors, whose independent and identically distributed (iid) realizations result in p-compressible signals. A signal x ∈ RN is called p-compressible with magnitude R if its sorted coefficients exhibit a power-law decay as |x|(i) R · i−d , where the decay rate d is equal to 1/p. p-compressible signals live close to K-sparse signals (K N) in the r -norm (r > p) since their best K-sparse approximation error decreases with O R · K 1/r−1/p . We show that the membership of generalized Pareto, Student’s t, log-normal, Fr´ chet, and log-logistic distributions to the set of compresse ible priors depends only on the distribution parameters and is independent of N . In contrast, we demonstrate that the membership of the generalized Gaussian distribution (GGD) depends both on the signal dimension and the GGD parameters: the expected decay rate of N -sample iid realizations from the GGD with the shape parameter q is given by 1/ [q log (N/q)]. As stylized examples, we show via experiments that the wavelet coefficients of natural images are 1.67-compressible whereas their pixel gradients are 0.95 log (N/0.95)-compressible, on the average. We also leverage the connections between compressible priors and sparse signals to develop new iterative re-weighted sparse signal recovery algorithms that outperform the standard 1 -norm minimization. Finally, we describe how to learn the hyperparameters of compressible priors in underdetermined regression problems by exploiting the geometry of their order statistics during signal recovery. 1
4 0.57894319 147 nips-2009-Matrix Completion from Noisy Entries
Author: Raghunandan Keshavan, Andrea Montanari, Sewoong Oh
Abstract: Given a matrix M of low-rank, we consider the problem of reconstructing it from noisy observations of a small, random subset of its entries. The problem arises in a variety of applications, from collaborative filtering (the ‘Netflix problem’) to structure-from-motion and positioning. We study a low complexity algorithm introduced in [1], based on a combination of spectral techniques and manifold optimization, that we call here O PT S PACE. We prove performance guarantees that are order-optimal in a number of circumstances. 1
5 0.52759391 156 nips-2009-Monte Carlo Sampling for Regret Minimization in Extensive Games
Author: Marc Lanctot, Kevin Waugh, Martin Zinkevich, Michael Bowling
Abstract: Sequential decision-making with multiple agents and imperfect information is commonly modeled as an extensive game. One efficient method for computing Nash equilibria in large, zero-sum, imperfect information games is counterfactual regret minimization (CFR). In the domain of poker, CFR has proven effective, particularly when using a domain-specific augmentation involving chance outcome sampling. In this paper, we describe a general family of domain-independent CFR sample-based algorithms called Monte Carlo counterfactual regret minimization (MCCFR) of which the original and poker-specific versions are special cases. We start by showing that MCCFR performs the same regret updates as CFR on expectation. Then, we introduce two sampling schemes: outcome sampling and external sampling, showing that both have bounded overall regret with high probability. Thus, they can compute an approximate equilibrium using self-play. Finally, we prove a new tighter bound on the regret for the original CFR algorithm and relate this new bound to MCCFR’s bounds. We show empirically that, although the sample-based algorithms require more iterations, their lower cost per iteration can lead to dramatically faster convergence in various games. 1
6 0.48536807 221 nips-2009-Solving Stochastic Games
7 0.47762132 15 nips-2009-A Rate Distortion Approach for Semi-Supervised Conditional Random Fields
8 0.46917048 181 nips-2009-Online Learning of Assignments
9 0.46696886 116 nips-2009-Information-theoretic lower bounds on the oracle complexity of convex optimization
10 0.46501005 45 nips-2009-Beyond Convexity: Online Submodular Minimization
11 0.46465498 240 nips-2009-Sufficient Conditions for Agnostic Active Learnable
12 0.45301527 144 nips-2009-Lower bounds on minimax rates for nonparametric regression with additive sparsity and smoothness
13 0.44718871 161 nips-2009-Nash Equilibria of Static Prediction Games
14 0.44589323 12 nips-2009-A Generalized Natural Actor-Critic Algorithm
15 0.44487154 239 nips-2009-Submodularity Cuts and Applications
16 0.4336257 59 nips-2009-Construction of Nonparametric Bayesian Models from Parametric Bayes Equations
17 0.43192202 14 nips-2009-A Parameter-free Hedging Algorithm
18 0.4304578 178 nips-2009-On Stochastic and Worst-case Models for Investing
19 0.42839551 91 nips-2009-Fast, smooth and adaptive regression in metric spaces
20 0.42835838 24 nips-2009-Adapting to the Shifting Intent of Search Queries