nips nips2011 nips2011-196 knowledge-graph by maker-knowledge-mining

196 nips-2011-On Strategy Stitching in Large Extensive Form Multiplayer Games


Source: pdf

Author: Richard G. Gibson, Duane Szafron

Abstract: Computing a good strategy in a large extensive form game often demands an extraordinary amount of computer memory, necessitating the use of abstraction to reduce the game size. Typically, strategies from abstract games perform better in the real game as the granularity of abstraction is increased. This paper investigates two techniques for stitching a base strategy in a coarse abstraction of the full game tree, to expert strategies in fine abstractions of smaller subtrees. We provide a general framework for creating static experts, an approach that generalizes some previous strategy stitching efforts. In addition, we show that static experts can create strong agents for both 2-player and 3-player Leduc and Limit Texas Hold’em poker, and that a specific class of static experts can be preferred among a number of alternatives. Furthermore, we describe a poker agent that used static experts and won the 3-player events of the 2010 Annual Computer Poker Competition.

Reference: text


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 ca Abstract Computing a good strategy in a large extensive form game often demands an extraordinary amount of computer memory, necessitating the use of abstraction to reduce the game size. [sent-2, score-1.059]

2 Typically, strategies from abstract games perform better in the real game as the granularity of abstraction is increased. [sent-3, score-0.827]

3 This paper investigates two techniques for stitching a base strategy in a coarse abstraction of the full game tree, to expert strategies in fine abstractions of smaller subtrees. [sent-4, score-1.605]

4 We provide a general framework for creating static experts, an approach that generalizes some previous strategy stitching efforts. [sent-5, score-0.601]

5 In addition, we show that static experts can create strong agents for both 2-player and 3-player Leduc and Limit Texas Hold’em poker, and that a specific class of static experts can be preferred among a number of alternatives. [sent-6, score-1.142]

6 Furthermore, we describe a poker agent that used static experts and won the 3-player events of the 2010 Annual Computer Poker Competition. [sent-7, score-0.762]

7 In addition, by computing the base strategy and the experts separately, we may lose “cohesion” among the different components. [sent-16, score-0.654]

8 We investigate stitched strategies in extensive form games, focusing on the trade-offs between the sizes of the abstractions versus the assumptions made and the cohesion among the computed strategies. [sent-17, score-0.635]

9 This paper generalizes previous strategy stitching efforts [1, 2, 11] under a more general static expert framework. [sent-19, score-0.808]

10 We use poker as a testbed to demonstrate that, despite recent mixed results, static experts can create much stronger overall agents than the base strategy alone. [sent-20, score-1.213]

11 Furthermore, we show that under a fixed memory limitation, a specific class of static experts are preferred to several 1 alternatives. [sent-21, score-0.588]

12 As a final validation of these results, we describe entries to the 2010 Annual Computer Poker Competition1 (ACPC) that used static experts to win the 3-player events. [sent-22, score-0.542]

13 Ii is the information partition of player i; a set I ∈ Ii is an information set of player i. [sent-40, score-0.749]

14 A strategy for player i, σi , is a function such that for each information set I ∈ Ii , σi (I) is a probability distribution over A(I). [sent-44, score-0.567]

15 A strategy profile σ consists of a strategy σi for each player i ∈ N . [sent-47, score-0.775]

16 We let σ−i refer to all the strategies in σ except σi , and denote ui (σ) to be the expected utility for player i given that all players play according to σ. [sent-48, score-0.656]

17 In a 2-player zero-sum game, a best response to a player 1 strategy σ1 is a player 2 strategy BR σ2 = argmaxσ2 u2 (σ1 , σ2 ) (similarly for a player 2 strategy σ2 ). [sent-49, score-1.701]

18 Outside of 2-player zero-sum games, the worst-case scenario for player i would be for all other players to minimize player i’s utility instead of maximizing their own. [sent-52, score-0.829]

19 The extensive form game representation of many real-world problems is too large to feasibly compute a strategy directly. [sent-59, score-0.563]

20 In the unabstracted game, player 1 cannot distinguish between whether chance generated b or c and player 2 cannot distinguish between a and b. [sent-63, score-0.776]

21 (b) An example of a game Γ′ derived from the unabstracted game Γ in (a) for a dynamic expert strategy. [sent-65, score-0.808]

22 Here, the abstraction from (a) is used as the base abstraction, and the null abstraction is employed on the subtree with G1,1 = ∅ and G2,1 = {al, bl, cl} (bold states). [sent-66, score-0.762]

23 [12]) An abstraction for player i is a pair αi = αI , αA , where i i • αI is a partition of Hi defining a set of abstract information sets coarser than Ii (i. [sent-69, score-0.639]

24 The null abstraction for player i is φi = Ii , A . [sent-73, score-0.639]

25 Finally, for any abstraction α, the abstract game, Γα , is the extensive game obtained from Γ by replacing Ii with αI and A(h) with αA (h) when P (h) = i, for all i ∈ N . [sent-75, score-0.604]

26 By reducing the number of information sets, computing strategies in an abstract game with an algorithm such as CFR requires less memory than computing strategies in the real game. [sent-77, score-0.571]

27 3 Strategy Stitching To achieve abstractions with finer granularity, a natural approach is to break the game up into subtrees, abstract each of the subtrees, and compute a strategy for each abstract subtree independently. [sent-81, score-0.737]

28 , Gi,p } is a grafting partition for player i if • Gi is a partition of Hi (possibly containing empty parts), • ∀I ∈ Ii , ∃j ∈ {0, 1, . [sent-91, score-0.553]

29 Then, compute a strategy, or static expert, for each subtree using any strategy computation technique, such as CFR. [sent-99, score-0.537]

30 If player 1 takes action r, player 2 no longer controls his or her decisions. [sent-102, score-0.758]

31 On the other hand, in (b), S = N = {1, 2}, G1,j = ∅, and hence all of player 1’s actions are seeded by the base strategy σ1 . [sent-105, score-0.822]

32 For each i ∈ S, let σi be a strategy for player i and Gi = {Gi,0 , Gi,1 , . [sent-108, score-0.567]

33 , p}, define Γj to be an extensive game derived from the original game Γ where, for all i ∈ S and h ∈ Hi \Gi,j , we set P (h) = C and fC (a|h) = σi (h, a). [sent-115, score-0.602]

34 That is, each player i ∈ S only controls actions for histories in Gi,j and is forced to play according to σi elsewhere. [sent-116, score-0.538]

35 Let the static expert of {Gi,j | i ∈ S}, σ j , be a strategy profile of the game Γj . [sent-117, score-0.943]

36 Finally, define the static expert strategy for S player i, σi , as σi (h, a) if h ∈ Gi,0 S σi (h, a) := j σi (h, a) if h ∈ Gi,j . [sent-118, score-1.055]

37 We call {σi | i ∈ S} the base or seeding strategies and {Gi | i ∈ S} the grafting profile for the S static expert strategy σi . [sent-119, score-1.152]

38 This may be the only subtree for which a static expert is computed (p = 1), or there could be more subtrees contained in the grafting partition(s) (p > 1). [sent-121, score-0.755]

39 Under a fixed memory limitation, we can employ finer abstractions for the subtrees Γj than we can in the full game Γ. [sent-122, score-0.614]

40 When |S| = 1, the static expert approach is identical to strategy grafting [11, Definition 8], with the exception that each static expert need not be an approximate Nash equilibrium. [sent-124, score-1.316]

41 We relax the definition for static experts because Nash equilibria are difficult to compute in multiplayer games, and may not be the best solution concept outside of 2-player zero-sum games anyways. [sent-125, score-0.767]

42 For example, in Figure 2b, it may not be wise for player 2 to assume that player 1 must follow σ1 . [sent-127, score-0.718]

43 As we will see in Section 6, having more static experts with |S| > 1 can result in a more exploitable static expert strategy. [sent-129, score-1.088]

44 On the other hand, by removing information sets for multiple players, the static expert approach creates smaller subtrees than strategy grafting does. [sent-130, score-0.915]

45 Section 6 shows that despite the risks, the abstraction gains often lead to static experts with S = N being preferred. [sent-132, score-0.791]

46 Regardless of the choice of S, the base strategy lacks “cohesion” with the static experts since its computation is based on its own play at the subtrees rather than the experts’ play. [sent-133, score-1.069]

47 Though the experts are identically seeded, the base strategy may want to play towards the expert subtrees more often to increase utility. [sent-134, score-0.995]

48 Let the i i ′ dynamic expert strategy for player i, σi , be a strategy for player i of the game Γ′ . [sent-149, score-1.66]

49 The abstraction α0 0 is denoted as the base abstraction and the dynamic expert σi is denoted as the base strategy. [sent-151, score-1.147]

50 We can view a dynamic expert strategy as a strategy computed in an abstraction with differing granularity dependent on the history of actions taken. [sent-153, score-1.093]

51 Note that our definition is somewhat redundant to the definition of abstraction as we are simply defining a new abstraction for Γ based on the abstractions α0 , α1 , . [sent-154, score-0.732]

52 Under memory constraints, a dynamic expert strategy typically sacrifices abstraction granularity in the base strategy to achieve finer granularity in the experts. [sent-160, score-1.317]

53 The base strategy’s abstraction is reduced to guarantee perfect cohesion between the base and the experts; the base strategy knows about the experts and can calculate its probabilities “dynamically” during strategy computation based on the feedback from the experts. [sent-163, score-1.574]

54 In Section 6, we contrast static and dynamic experts to compare this trade-off between abstraction size and strategy cohesion. [sent-164, score-1.071]

55 4 Texas and Leduc Hold’em A hand of Texas Hold’em poker (or simply Hold’em) begins with each player being dealt two private cards, and two players posting mandatory bets or blinds. [sent-165, score-0.792]

56 Of the players that did not fold, the player with the highest ranked poker hand wins all of the bets. [sent-167, score-0.69]

57 Only one private card is dealt to each player and one community card is dealt on the flop. [sent-173, score-0.626]

58 5 Related Work Our general framework for applying static experts to any extensive form game captures some previous poker-specific strategy stitching approaches. [sent-181, score-1.217]

59 First, the PsOpti family of agents [2], which play 2-player Limit Hold’em, contain a base strategy called the “pre-flop model” and 7 static experts with S = N , or “post-flop models. [sent-182, score-1.011]

60 Secondly, Abou Risk and Szafron [1] attach 6 static experts with S = N (which they call “heads-up experts”) to a base strategy for playing 3-player Limit Hold’em. [sent-186, score-0.958]

61 However, their results were mixed as the stitched strategy was not always better than the base strategy alone. [sent-188, score-0.662]

62 Nonetheless, our positive results for static experts with S = N in Section 6 provide evidence that the PsOpti approach and heads-up experts are indeed credible. [sent-189, score-0.803]

63 While 2player poker is well studied, Ganzfried and Sandholm [3, 4] developed algorithms for computing Nash equilibria in multiplayer games and applied it to a small 3-player jam/fold poker game. [sent-196, score-0.665]

64 In fact, strategy stitching is orthogonal to both strategy computation and abstraction improvements, and could be used in conjunction with more sophisticated techniques. [sent-200, score-0.777]

65 CFR is state of the art in terms of memory efficiency for strategy computation, allowing us to employ abstractions with higher granularity than otherwise possible. [sent-202, score-0.559]

66 Even though Leduc is small enough to not necessitate strategy stitching, the Leduc experiments were conducted to evaluate our hypothesis that static experts with S = N can improve play. [sent-205, score-0.75]

67 We refer to a b-expert for player i as an expert constructed for the subtree Gi (b) := {h ∈ Hi | b is a prefix of b(h)} containing all histories where the players initially follow b. [sent-209, score-0.786]

68 For example, the experts for the games in Figures 1b, 2a, and 2b are l-experts because the game is split after player 1 takes action l. [sent-210, score-1.028]

69 The second and third abstractions are the “JQ-K” and “J-QK” abstractions that, on the pre-flop, cannot distinguish between whether the private card is a Jack or Queen, or whether the private card is a Queen or King respectively. [sent-213, score-0.726]

70 For both 2-player and 3-player, for each of the three base abstractions, and for each player i, we build a base strategy, a dynamic expert strategy, an S = {i} static expert strategy, and two S = N static expert strategies. [sent-216, score-1.984]

71 Recall choosing S = {i} means that during computation of each static expert, we only fix player i’s action probabilities outside of the expert subtree, whereas S = N means that we fix all players outside of the subtree. [sent-217, score-0.998]

72 Thus, the base strategy plays until the first raise occurs, at which point an expert takes over for the remainder of the hand. [sent-219, score-0.6]

73 For 3-player Leduc, we use r, cr, ccr, cccr, ccccr, and cccccr-experts, except the “Pre-flop” static strategies use just the three experts r, cr, and ccr. [sent-221, score-0.681]

74 The null abstraction is employed 6 Table 1: The size, earnings, and exploitability of the 2-player (2p) Leduc strategies in the JQ-K base abstraction, and the size and earnings of the 3-player (3p) strategies in the J-QK base abstraction. [sent-222, score-1.093]

75 Each strategy is evaluated against all combinations and orderings of opponent strategies where all strategies use different base abstractions, and the scores are averaged together. [sent-255, score-0.729]

76 For example, for ′ each of our 2-player strategy profiles σ in the JQ-K base abstraction, we compute 1/2(u1 (σ1 , σ2 ) + ′ ′ u2 (σ1 , σ2 )), averaged over all profiles σ that use either the null or J-QK base abstraction. [sent-256, score-0.609]

77 Firstly, by increasing abstraction granularity, all of the JQ-K strategies employing experts earn more than the base strategy alone. [sent-259, score-1.042]

78 Finally, we see that only using two pre-flop static experts as opposed to all four reduces the number of dangerous assumptions to provide a stronger and less exploitable strategy. [sent-266, score-0.623]

79 Our 2-player abstractions distribute buckets as close to uniformly as possible across the betting rounds while remembering buckets from previous rounds (known as “perfect recall”). [sent-272, score-0.591]

80 For 2-player, our dynamic strategy has just an r-expert, our S = {i} static strategy uses r, cr, ccr, and cccr-experts, and our S = N static strategy employs r and cr-experts. [sent-274, score-1.258]

81 For example, in 3-player, our dynamic strategy’s base abstraction has just 8 river buckets with 7290 river buckets for each expert, whereas our static strategies have 16 river buckets in the base abstraction with up to 194,481 river buckets for the S = N static rcf -expert abstraction. [sent-278, score-2.344]

82 For reference, all of the 2-player base and experts are built from 720 million iterations of CFR, while we run CFR for 100 million and 5 billion iterations for the 3-player base and experts respectively. [sent-279, score-0.96]

83 We evaluate our 2-player strategies by playing 500,000 duplicate hands (players play both sides of the dealt cards) of poker between each pair of strategies. [sent-280, score-0.557]

84 In addition to our base and stitched strategies, we also included a base strategy called “Base. [sent-281, score-0.639]

85 For 3-player, we play 500,000 triplicate hands (each set of dealt cards played 6 times, one for each of the player seatings) between each combination of 3 strategies. [sent-284, score-0.607]

86 We also included two other strategies: “ACPC-09,” the 2009 ACPC 3-player event winner that did not use experts (Abou Risk and Szafron [1] call it “IR16”), and “ACPC-10,” a static expert strategy that won a 3-player event at the 2010 ACPC and is outlined at the end of this section. [sent-285, score-0.957]

87 By forcing one player to fold, the static experts with S = N essentially reduce the size of the game tree from a 3-player to a 2-player game, allowing many more buckets to be used. [sent-330, score-1.269]

88 This result indicates that at least for poker, the gains in abstraction bucketing outweigh the risks of forced action assumptions and lack of cohesion between the base strategy and the experts. [sent-331, score-0.845]

89 While there are one and two opponent static actions assumed by the r and cr-experts respectively, trading these few assumptions for an increase in abstraction granularity is beneficial. [sent-334, score-0.706]

90 In summary, static experts with S = N are preferred to both dynamic and static experts with S = {i} in the experiments we ran. [sent-335, score-1.156]

91 The winning entries in both 3-player events employed static experts with S = N . [sent-337, score-0.542]

92 7 Conclusions We discussed two strategy stitching techniques for extensive games, including static experts that generalize strategy grafting and some previous techniques used in poker. [sent-343, score-1.31]

93 Despite the accompanying potential dangers and lack of cohesion, we have shown static experts with S = N outperform the dynamic and static experts with S = {i} that we considered, especially when memory limitations are present. [sent-344, score-1.202]

94 However, additional static experts with several forced actions can lead to a more exploitable strategy. [sent-345, score-0.671]

95 Static experts with S = N is currently our preferred method for creating multiplayer poker strategies and would be our first option for playing other large extensive games. [sent-346, score-0.82]

96 Future work includes finding a way to create more cohesion between the base strategy and static experts. [sent-347, score-0.796]

97 One possibility is to rebuild the base strategy after the experts have been created so that the base strategy’s play is more unified with the experts. [sent-348, score-0.886]

98 In addition, we have yet to experiment with 3player “hybrid” static experts where |S| = 2. [sent-349, score-0.542]

99 One possibility is to use a dynamic expert strategy as a base strategy of a static expert strategy. [sent-351, score-1.368]

100 In addition, static experts could themselves be dynamic expert strategies for the appropriate subtrees. [sent-352, score-0.96]


similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('player', 0.359), ('static', 0.281), ('experts', 0.261), ('abstraction', 0.249), ('game', 0.247), ('abstractions', 0.234), ('poker', 0.22), ('strategy', 0.208), ('expert', 0.207), ('base', 0.185), ('leduc', 0.173), ('cfr', 0.173), ('strategies', 0.139), ('grafting', 0.132), ('buckets', 0.121), ('games', 0.121), ('op', 0.12), ('stitching', 0.112), ('players', 0.111), ('extensive', 0.108), ('exploitability', 0.104), ('cohesion', 0.093), ('hands', 0.091), ('subtrees', 0.087), ('waugh', 0.081), ('dynamic', 0.072), ('granularity', 0.071), ('em', 0.07), ('multiplayer', 0.069), ('private', 0.065), ('card', 0.064), ('stitched', 0.061), ('histories', 0.061), ('earnings', 0.061), ('exploitable', 0.058), ('opponent', 0.058), ('hold', 0.055), ('cards', 0.05), ('imperfect', 0.05), ('subtree', 0.048), ('alberta', 0.048), ('play', 0.047), ('actions', 0.047), ('abou', 0.046), ('bucketing', 0.046), ('gilpin', 0.046), ('nash', 0.046), ('szafron', 0.046), ('river', 0.046), ('memory', 0.046), ('pro', 0.044), ('betting', 0.041), ('bowling', 0.04), ('action', 0.04), ('ner', 0.039), ('bucket', 0.037), ('rounds', 0.037), ('dealt', 0.037), ('gi', 0.037), ('texas', 0.037), ('percentile', 0.035), ('fc', 0.035), ('hi', 0.035), ('acpc', 0.035), ('ccr', 0.035), ('equilibria', 0.035), ('ganzfried', 0.035), ('johanson', 0.035), ('opponents', 0.035), ('psopti', 0.035), ('rcf', 0.035), ('unabstracted', 0.035), ('million', 0.034), ('equilibrium', 0.033), ('null', 0.031), ('partition', 0.031), ('history', 0.031), ('sandholm', 0.03), ('ii', 0.03), ('les', 0.029), ('create', 0.029), ('agents', 0.029), ('cr', 0.029), ('pre', 0.027), ('fold', 0.026), ('bold', 0.025), ('coarse', 0.024), ('forced', 0.024), ('aamas', 0.024), ('billings', 0.023), ('burch', 0.023), ('dangerous', 0.023), ('osborne', 0.023), ('rrf', 0.023), ('schnizlein', 0.023), ('seeded', 0.023), ('strategically', 0.023), ('triplicate', 0.023), ('chance', 0.023), ('playing', 0.023)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 1.0000007 196 nips-2011-On Strategy Stitching in Large Extensive Form Multiplayer Games

Author: Richard G. Gibson, Duane Szafron

Abstract: Computing a good strategy in a large extensive form game often demands an extraordinary amount of computer memory, necessitating the use of abstraction to reduce the game size. Typically, strategies from abstract games perform better in the real game as the granularity of abstraction is increased. This paper investigates two techniques for stitching a base strategy in a coarse abstraction of the full game tree, to expert strategies in fine abstractions of smaller subtrees. We provide a general framework for creating static experts, an approach that generalizes some previous strategy stitching efforts. In addition, we show that static experts can create strong agents for both 2-player and 3-player Leduc and Limit Texas Hold’em poker, and that a specific class of static experts can be preferred among a number of alternatives. Furthermore, we describe a poker agent that used static experts and won the 3-player events of the 2010 Annual Computer Poker Competition.

2 0.23407438 218 nips-2011-Predicting Dynamic Difficulty

Author: Olana Missura, Thomas Gärtner

Abstract: Motivated by applications in electronic games as well as teaching systems, we investigate the problem of dynamic difficulty adjustment. The task here is to repeatedly find a game difficulty setting that is neither ‘too easy’ and bores the player, nor ‘too difficult’ and overburdens the player. The contributions of this paper are (i) the formulation of difficulty adjustment as an online learning problem on partially ordered sets, (ii) an exponential update algorithm for dynamic difficulty adjustment, (iii) a bound on the number of wrong difficulty settings relative to the best static setting chosen in hindsight, and (iv) an empirical investigation of the algorithm when playing against adversaries. 1

3 0.14775227 220 nips-2011-Prediction strategies without loss

Author: Michael Kapralov, Rina Panigrahy

Abstract: Consider a sequence of bits where we are trying to predict the next bit from the previous bits. Assume we are allowed to say ‘predict 0’ or ‘predict 1’, and our payoff is +1 if the prediction is correct and −1 otherwise. We will say that at each point in time the loss of an algorithm is the number of wrong predictions minus the number of right predictions so far. In this paper we are interested in algorithms that have essentially zero (expected) loss over any string at any point in time and yet have small regret with respect to always predicting 0 or always predicting 1. For a sequence of length T our algorithm has regret 14 T and loss √ 2 2 T e− T in expectation for all strings. We show that the tradeoff between loss and regret is optimal up to constant factors. Our techniques extend to the general setting of N experts, where the related problem of trading off regret to the best expert for regret to the ’special’ expert has been studied by Even-Dar et al. (COLT’07). We obtain essentially zero loss with respect to the special expert and optimal loss/regret tradeoff, improving upon the results of Even-Dar et al and settling the main question left open in their paper. The strong loss bounds of the algorithm have some surprising consequences. First, we obtain a parameter free algorithm for the experts problem that has optimal regret bounds with respect to k-shifting optima, i.e. bounds with respect to the optimum that is allowed to change arms multiple times. Moreover, for any window of size n the regret of our algorithm to any expert never exceeds O( n(log N + log T )), where N is the number of experts and T is the time horizon, while maintaining the essentially zero loss property. 1

4 0.095484614 204 nips-2011-Online Learning: Stochastic, Constrained, and Smoothed Adversaries

Author: Alexander Rakhlin, Karthik Sridharan, Ambuj Tewari

Abstract: Learning theory has largely focused on two main learning scenarios: the classical statistical setting where instances are drawn i.i.d. from a fixed distribution, and the adversarial scenario wherein, at every time step, an adversarially chosen instance is revealed to the player. It can be argued that in the real world neither of these assumptions is reasonable. We define the minimax value of a game where the adversary is restricted in his moves, capturing stochastic and non-stochastic assumptions on data. Building on the sequential symmetrization approach, we define a notion of distribution-dependent Rademacher complexity for the spectrum of problems ranging from i.i.d. to worst-case. The bounds let us immediately deduce variation-type bounds. We study a smoothed online learning scenario and show that exponentially small amount of noise can make function classes with infinite Littlestone dimension learnable. 1

5 0.086932689 300 nips-2011-Variance Reduction in Monte-Carlo Tree Search

Author: Joel Veness, Marc Lanctot, Michael Bowling

Abstract: Monte-Carlo Tree Search (MCTS) has proven to be a powerful, generic planning technique for decision-making in single-agent and adversarial environments. The stochastic nature of the Monte-Carlo simulations introduces errors in the value estimates, both in terms of bias and variance. Whilst reducing bias (typically through the addition of domain knowledge) has been studied in the MCTS literature, comparatively little effort has focused on reducing variance. This is somewhat surprising, since variance reduction techniques are a well-studied area in classical statistics. In this paper, we examine the application of some standard techniques for variance reduction in MCTS, including common random numbers, antithetic variates and control variates. We demonstrate how these techniques can be applied to MCTS and explore their efficacy on three different stochastic, single-agent settings: Pig, Can’t Stop and Dominion. 1

6 0.065654203 98 nips-2011-From Bandits to Experts: On the Value of Side-Observations

7 0.063628688 136 nips-2011-Inverting Grice's Maxims to Learn Rules from Natural Language Extractions

8 0.059593979 40 nips-2011-Automated Refinement of Bayes Networks' Parameters based on Test Ordering Constraints

9 0.057380538 80 nips-2011-Efficient Online Learning via Randomized Rounding

10 0.055643879 210 nips-2011-PAC-Bayesian Analysis of Contextual Bandits

11 0.05077007 163 nips-2011-MAP Inference for Bayesian Inverse Reinforcement Learning

12 0.045973346 65 nips-2011-Convergent Fitted Value Iteration with Linear Function Approximation

13 0.04035842 69 nips-2011-Differentially Private M-Estimators

14 0.040207729 249 nips-2011-Sequence learning with hidden units in spiking neural networks

15 0.039937597 88 nips-2011-Environmental statistics and the trade-off between model-based and TD learning in humans

16 0.038077787 190 nips-2011-Nonlinear Inverse Reinforcement Learning with Gaussian Processes

17 0.037066668 202 nips-2011-On the Universality of Online Mirror Descent

18 0.037049092 97 nips-2011-Finite Time Analysis of Stratified Sampling for Monte Carlo

19 0.037047599 79 nips-2011-Efficient Offline Communication Policies for Factored Multiagent POMDPs

20 0.036781464 245 nips-2011-Selecting the State-Representation in Reinforcement Learning


similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.097), (1, -0.1), (2, 0.013), (3, 0.031), (4, 0.053), (5, -0.017), (6, -0.002), (7, -0.008), (8, 0.03), (9, -0.022), (10, 0.024), (11, -0.046), (12, -0.009), (13, 0.001), (14, 0.039), (15, -0.039), (16, 0.031), (17, 0.055), (18, 0.053), (19, -0.026), (20, 0.102), (21, -0.032), (22, -0.098), (23, 0.001), (24, 0.009), (25, 0.018), (26, -0.136), (27, -0.077), (28, 0.042), (29, 0.025), (30, -0.068), (31, -0.082), (32, -0.081), (33, 0.092), (34, 0.125), (35, -0.128), (36, 0.2), (37, -0.119), (38, -0.007), (39, 0.008), (40, -0.293), (41, 0.062), (42, -0.16), (43, -0.015), (44, -0.063), (45, 0.039), (46, -0.022), (47, 0.053), (48, -0.187), (49, -0.004)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.9866789 196 nips-2011-On Strategy Stitching in Large Extensive Form Multiplayer Games

Author: Richard G. Gibson, Duane Szafron

Abstract: Computing a good strategy in a large extensive form game often demands an extraordinary amount of computer memory, necessitating the use of abstraction to reduce the game size. Typically, strategies from abstract games perform better in the real game as the granularity of abstraction is increased. This paper investigates two techniques for stitching a base strategy in a coarse abstraction of the full game tree, to expert strategies in fine abstractions of smaller subtrees. We provide a general framework for creating static experts, an approach that generalizes some previous strategy stitching efforts. In addition, we show that static experts can create strong agents for both 2-player and 3-player Leduc and Limit Texas Hold’em poker, and that a specific class of static experts can be preferred among a number of alternatives. Furthermore, we describe a poker agent that used static experts and won the 3-player events of the 2010 Annual Computer Poker Competition.

2 0.8419736 218 nips-2011-Predicting Dynamic Difficulty

Author: Olana Missura, Thomas Gärtner

Abstract: Motivated by applications in electronic games as well as teaching systems, we investigate the problem of dynamic difficulty adjustment. The task here is to repeatedly find a game difficulty setting that is neither ‘too easy’ and bores the player, nor ‘too difficult’ and overburdens the player. The contributions of this paper are (i) the formulation of difficulty adjustment as an online learning problem on partially ordered sets, (ii) an exponential update algorithm for dynamic difficulty adjustment, (iii) a bound on the number of wrong difficulty settings relative to the best static setting chosen in hindsight, and (iv) an empirical investigation of the algorithm when playing against adversaries. 1

3 0.71718735 136 nips-2011-Inverting Grice's Maxims to Learn Rules from Natural Language Extractions

Author: Mohammad S. Sorower, Janardhan R. Doppa, Walker Orr, Prasad Tadepalli, Thomas G. Dietterich, Xiaoli Z. Fern

Abstract: We consider the problem of learning rules from natural language text sources. These sources, such as news articles and web texts, are created by a writer to communicate information to a reader, where the writer and reader share substantial domain knowledge. Consequently, the texts tend to be concise and mention the minimum information necessary for the reader to draw the correct conclusions. We study the problem of learning domain knowledge from such concise texts, which is an instance of the general problem of learning in the presence of missing data. However, unlike standard approaches to missing data, in this setting we know that facts are more likely to be missing from the text in cases where the reader can infer them from the facts that are mentioned combined with the domain knowledge. Hence, we can explicitly model this “missingness” process and invert it via probabilistic inference to learn the underlying domain knowledge. This paper introduces a mention model that models the probability of facts being mentioned in the text based on what other facts have already been mentioned and domain knowledge in the form of Horn clause rules. Learning must simultaneously search the space of rules and learn the parameters of the mention model. We accomplish this via an application of Expectation Maximization within a Markov Logic framework. An experimental evaluation on synthetic and natural text data shows that the method can learn accurate rules and apply them to new texts to make correct inferences. Experiments also show that the method out-performs the standard EM approach that assumes mentions are missing at random. 1

4 0.38733557 300 nips-2011-Variance Reduction in Monte-Carlo Tree Search

Author: Joel Veness, Marc Lanctot, Michael Bowling

Abstract: Monte-Carlo Tree Search (MCTS) has proven to be a powerful, generic planning technique for decision-making in single-agent and adversarial environments. The stochastic nature of the Monte-Carlo simulations introduces errors in the value estimates, both in terms of bias and variance. Whilst reducing bias (typically through the addition of domain knowledge) has been studied in the MCTS literature, comparatively little effort has focused on reducing variance. This is somewhat surprising, since variance reduction techniques are a well-studied area in classical statistics. In this paper, we examine the application of some standard techniques for variance reduction in MCTS, including common random numbers, antithetic variates and control variates. We demonstrate how these techniques can be applied to MCTS and explore their efficacy on three different stochastic, single-agent settings: Pig, Can’t Stop and Dominion. 1

5 0.35971782 201 nips-2011-On the Completeness of First-Order Knowledge Compilation for Lifted Probabilistic Inference

Author: Guy Broeck

Abstract: Probabilistic logics are receiving a lot of attention today because of their expressive power for knowledge representation and learning. However, this expressivity is detrimental to the tractability of inference, when done at the propositional level. To solve this problem, various lifted inference algorithms have been proposed that reason at the first-order level, about groups of objects as a whole. Despite the existence of various lifted inference approaches, there are currently no completeness results about these algorithms. The key contribution of this paper is that we introduce a formal definition of lifted inference that allows us to reason about the completeness of lifted inference algorithms relative to a particular class of probabilistic models. We then show how to obtain a completeness result using a first-order knowledge compilation approach for theories of formulae containing up to two logical variables. 1 Introduction and related work Probabilistic logic models build on first-order logic to capture relational structure and on graphical models to represent and reason about uncertainty [1, 2]. Due to their expressivity, these models can concisely represent large problems with many interacting random variables. While the semantics of these logics is often defined through grounding the models [3], performing inference at the propositional level is – as for first-order logic – inefficient. This has motivated the quest for lifted inference methods that exploit the structure of probabilistic logic models for efficient inference, by reasoning about groups of objects as a whole and avoiding repeated computations. The first approaches to exact lifted inference have upgraded the variable elimination algorithm to the first-order level [4, 5, 6]. More recent work is based on methods from logical inference [7, 8, 9, 10], such as knowledge compilation. While these approaches often yield dramatic improvements in runtime over propositional inference methods on specific problems, it is still largely unclear for which classes of models these lifted inference operators will be useful and for which ones they will eventually have to resort to propositional inference. One notable exception in this regard is lifted belief propagation [11], which performs exact lifted inference on any model whose factor graph representation is a tree. A first contribution of this paper is that we introduce a notion of domain lifted inference, which formally defines what lifting means, and which can be used to characterize the classes of probabilistic models to which lifted inference applies. Domain lifted inference essentially requires that probabilistic inference runs in polynomial time in the domain size of the logical variables appearing in the model. As a second contribution we show that the class of models expressed as 2-WFOMC formulae (weighted first-order model counting with up to 2 logical variables per formula) can be domain lifted using an extended first-order knowledge compilation approach [10]. The resulting approach allows for lifted inference even in the presence of (anti-) symmetric or total relations in a theory. These are extremely common and useful concepts that cannot be lifted by any of the existing first-order knowledge compilation inference rules. 1 2 Background We will use standard concepts of function-free first-order logic (FOL). An atom p(t1 , . . . , tn ) consists of a predicate p/n of arity n followed by n arguments, which are either constants or logical variables. An atom is ground if it does not contain any variables. A literal is an atom a or its negation ¬a. A clause is a disjunction l1 ∨ ... ∨ lk of literals. If k = 1, it is a unit clause. An expression is an atom, literal or clause. The pred(a) function maps an atom to its predicate and the vars(e) function maps an expression to its logical variables. A theory in conjunctive normal form (CNF) is a conjunction of clauses. We often represent theories by their set of clauses and clauses by their set of literals. Furthermore, we will assume that all logical variables are universally quantified. In addition, we associate a set of constraints with each clause or atom, either of the form X = t, where X is a logical variable and t is a constant or variable, or of the form X ∈ D, where D is a domain, or the negation of these constraints. These define a finite domain for each logical variable. Abusing notation, we will use constraints of the form X = t to denote a substitution of X by t. The function atom(e) maps an expression e to its atoms, now associating the constraints on e with each atom individually. To add the constraint c to an expression e, we use the notation e ∧ c. Two atoms unify if there is a substitution which makes them identical and if the conjunction of the constraints on both atoms with the substitution is satisfiable. Two expressions e1 and e2 are independent, written e1 ⊥ e2 , if no atom a1 ∈ atom(e1 ) unifies with an atom a2 ∈ atom(e2 ). ⊥ We adopt the Weighted First-Order Model Counting (WFOMC) [10] formalism to represent probabilistic logic models, building on the notion of a Herbrand interpretation. Herbrand interpretations are subsets of the Herbrand base HB (T ), which consists of all ground atoms that can be constructed with the available predicates and constant symbols in T . The atoms in a Herbrand interpretation are assumed to be true. All other atoms in HB (T ) are assumed to be false. An interpretation I satisfies a theory T , written as I |= T , if it satisfies all the clauses c ∈ T . The WFOMC problem is defined on a weighted logic theory T , which is a logic theory augmented with a positive weight function w and a negative weight function w, which assign a weight to each predicate. The WFOMC problem involves computing wmc(T, w, w) = w(pred(a)) I|=T a∈I 3 3.1 w(pred(a)). (1) a∈HB(T )\I First-order knowledge compilation for lifted probabilistic inference Lifted probabilistic inference A first-order probabilistic model defines a probability distribution P over the set of Herbrand interpretations H. Probabilistic inference in these models is concerned with computing the posterior probability P(q|e) of query q given evidence e, where q and e are logical expressions in general: P(q|e) = h∈H,h|=q∧e P(h) h∈H,h|=e P(h) (2) We propose one notion of lifted inference for first-order probabilistic models, defined in terms of the computational complexity of inference w.r.t. the domains of logical variables. It is clear that other notions of lifted inference are conceivable, especially in the case of approximate inference. Definition 1 (Domain Lifted Probabilistic Inference). A probabilistic inference procedure is domain lifted for a model m, query q and evidence e iff the inference procedure runs in polynomial time in |D1 |, . . . , |Dk | with Di the domain of the logical variable vi ∈ vars(m, q, e). Domain lifted inference does not prohibit the algorithm to be exponential in the size of the vocabulary, that is, the number of predicates, arguments and constants, of the probabilistic model, query and evidence. In fact, the definition allows inference to be exponential in the number of constants which occur in arguments of atoms in the theory, query or evidence, as long as it is polynomial in the cardinality of the logical variable domains. This definition of lifted inference stresses the ability to efficiently deal with the domains of the logical variables that arise, regardless of their size, and formalizes what seems to be generally accepted in the lifted inference literature. 2 A class of probabilistic models is a set of probabilistic models expressed in a particular formalism. As examples, consider Markov logic networks (MLN) [12] or parfactors [4], or the weighted FOL theories for WFOMC that we introduced above, when the weights are normalized. Definition 2 (Completeness). Restricting queries to atoms and evidence to a conjunction of literals, a procedure that is domain lifted for all probabilistic models m in a class of models M and for all queries q and evidence e, is called complete for M . 3.2 First-order knowledge compilation First-order knowledge compilation is an approach to lifted probabilistic inference consisting of the following three steps (see Van den Broeck et al. [10] for details): 1. Convert the probabilistic logical model to a weighted CNF. Converting MLNs or parfactors requires adding new atoms to the theory that represent the (truth) value of each factor or formula. set-disjunction 2 friends(X, Y ) ∧ smokes(X) ⇒ smokes(Y ) Smokers ⊆ People decomposable conjunction unit clause leaf (a) MLN Model ∧ smokes(X), X ∈ Smokers smokes(Y ) ∨ ¬ smokes(X) ∨¬ friends(X, Y ) ∨ ¬ f(X, Y ) friends(X, Y ) ∨ f(X, Y ) smokes(X) ∨ f(X, Y ) ¬ smokes(Y ) ∨ f(X, Y ). ∧ f(X, Y ), Y ∈ Smokers ∧ ¬ smokes(Y ), Y ∈ Smokers / ∧ f(X, Y ), X ∈ Smokers, Y ∈ Smokers / / x ∈ Smokers (b) CNF Theory deterministic disjunction Predicate friends smokes f w 1 1 e2 w 1 1 1 y ∈ Smokers / ∨ set-conjunction ∧ f(x, y) (c) Weight Functions ¬ friends(x, y) ∧ friends(x, y) ¬ f(x, y) (d) First-Order d-DNNF Circuit Figure 1: Friends-smokers example (taken from [10]) Example 1. The MLN in Figure 1a assigns a weight to a formula in FOL. Figure 1b represents the same model as a weighted CNF, introducing a new atom f(X, Y ) to encode the truth value of the MLN formula. The probabilistic information is captured by the weight functions in Figure 1c. 2. Compile the logical theory into a First-Order d-DNNF (FO d-DNNF) circuit. Figure 1d shows an example of such a circuit. Leaves represent unit clauses. Inner nodes represent the disjunction or conjunction of their children l and r, but with the constraint that disjunctions must be deterministic (l ∧ r is unsatisfiable) and conjunctions must be decomposable (l ⊥ r). ⊥ 3. Perform WFOMC inference to compute posterior probabilities. In a FO d-DNNF circuit, WFOMC is polynomial in the size of the circuit and the cardinality of the domains. To compile the CNF theory into a FO d-DNNF circuit, Van den Broeck et al. [10] propose a set of compilation rules, which we will refer to as CR 1 . We will now briefly describe these rules. Unit Propagation introduces a decomposable conjunction when the theory contains a unit clause. Independence creates a decomposable conjunction when the theory contains independent subtheories. Shannon decomposition applies when the theory contains ground atoms and introduces a deterministic disjunction between two modified theories: one where the ground atom is true, and one where it is false. Shattering splits clauses in the theory until all pairs of atoms represent either a disjoint or identical set of ground atoms. Example 2. In Figure 2a, the first two clauses are made independent from the friends(X, X) clause and split off in a decomposable conjunction by unit propagation. The unit clause becomes a leaf of the FO d-DNNF circuit, while the other operand requires further compilation. 3 dislikes(X, Y ) ∨ friends(X, Y ) fun(X) ∨ ¬ friends(X, Y ) friends(X, Y ) ∨ dislikes(X, Y ) ¬ friends(X, Y ) ∨ likes(X, Y ) friends(X, X) fun(X) ∨ ¬ friends(X, Y ) fun(X) ∨ ¬ friends(Y, X) ∧ FunPeople ⊆ People x ∈ People friends(X, X) friends(X, Y ) ∨ dislikes(X, Y ), X = Y ¬ friends(X, Y ) ∨ likes(X, Y ), X = Y likes(X, X) dislikes(x, Y ) ∨ friends(x, Y ) fun(x) ∨ ¬ friends(x, Y ) fun(X), X ∈ FunPeople ¬ fun(X), X ∈ FunPeople / fun(X) ∨ ¬ friends(X, Y ) fun(X) ∨ ¬ friends(Y, X) (a) Unit propagation of friends(X, X) (b) Independent partial grounding (c) Atom counting of fun(X) Figure 2: Examples of compilation rules. Circles are FO d-DNNF inner nodes. White rectangles show theories before and after applying the rule. All variable domains are People. (taken from [10]) Independent Partial Grounding creates a decomposable conjunction over a set of child circuits, which are identical up to the value of a grounding constant. Since they are structurally identical, only one child circuit is actually compiled. Atom Counting applies when the theory contains an atom with a single logical variable X ∈ D. It explicitly represents the domain D ⊆ D of X for which the atom is true. It compiles the theory into a deterministic disjunction between all possible such domains. Again, these child circuits are identical up to the value of D and only one is compiled. Example 3. The theory in Figure 2b is compiled into a decomposable set-conjunction of theories that are independent and identical up to the value of the x constant. The theory in Figure 2c contains an atom with one logical variable: fun(X). Atom counting compiles it into a deterministic setdisjunction over theories that differ in FunPeople, which is the domain of X for which fun(X) is true. Subsequent steps of unit propagation remove the fun(X) atoms from the theory entirely. 3.3 Completeness We will now characterize those theories where the CR 1 compilation rules cannot be used, and where the inference procedure has to resort to grounding out the theory to propositional logic. For these, first-order knowledge compilation using CR 1 is not yet domain lifted. When a logical theory contains symmetric, anti-symmetric or total relations, such as friends(X, Y ) ⇒ friends(Y, X), parent(X, Y ) ⇒ ¬ parent(Y, X), X = Y, ≤ (X, Y) ∨ ≤ (Y, X), or more general formulas, such as enemies(X, Y ) ⇒ ¬ friend(X, Y ) ∧ ¬ friend(Y, X), (3) (4) (5) (6) none of the CR 1 rules apply. Intuitively, the underlying problem is the presence of either: • Two unifying (not independent) atoms in the same clause which contain the same logical variable in different positions of the argument list. Examples include (the CNF of) Formulas 3, 4 and 5, where the X and Y variable are bound by unifying two atoms from the same clause. • Two logical variables that bind when unifying one pair of atoms but appear in different positions of the argument list of two other unifying atoms. Examples include Formula 6, which in CNF is ¬ friend(X, Y ) ∨ ¬ enemies(X, Y ) ¬ friend(Y, X) ∨ ¬ enemies(X, Y ) Here, unifying the enemies(X, Y ) atoms binds the X variables from both clauses, which appear in different positions of the argument lists of the unifying atoms friend(X, Y ) and friend(Y, X). Both of these properties preclude the use of CR 1 rules. Also in the context of other model classes, such as MLNs, probabilistic versions of the above formulas cannot be processed by CR 1 rules. 4 Even though first-order knowledge compilation with CR 1 rules does not have a clear completeness result, we can show some properties of theories to which none of the compilation rules apply. First, we need to distinguish between the arity of an atom and its dimension. A predicate with arity two might have atoms with dimension one, when one of the arguments is ground or both are identical. Definition 3 (Dimension of an Expression). The dimension of an expression e is the number of logical variables it contains: dim(e) = | vars(e)|. Lemma 1 (CR 1 Postconditions). The CR 1 rules remove all atoms from the theory T which have zero or one logical variable arguments, such that afterwards ∀a ∈ atom(T ) : dim(a) > 1. When no CR 1 rule applies, the theory is shattered and contains no independent subtheories. Proof. Ground atoms are removed by the Shannon decomposition operator followed by unit propagation. Atoms with a single logical variable (including unary relations) are removed by the atom counting operator followed by unit propagation. If T contains independent subtheories, the independence operator can be applied. Shattering is always applied when T is not yet shattered. 4 Extending first-order knowledge compilation In this section we introduce a new operator which does apply to the theories from Section 3.3. 4.1 Logical variable properties To formally define the operator we propose, and prove its correctness, we first introduce some mathematical concepts related to the logical variables in a theory (partly after Jha et al. [8]). Definition 4 (Binding Variables). Two logical variables X, Y are directly binding b(X, Y ) if they are bound by unifying a pair of atoms in the theory. The binding relationship b+ (X, Y ) is the transitive closure of the directly binding relation b(X, Y ). Example 4. In the theory ¬ p(W, X) ∨ ¬ q(X) r(Y ) ∨ ¬ q(Y ) ¬ r(Z) ∨ s(Z) the variable pairs (X, Y ) and (Y, Z) are directly binding. The variables X, Y and Z are binding. Variable W does not bind to any other variable. Note that the binding relationship b+ (X, Y ) is an equivalence relation that defines two equivalence classes: {X, Y, Z} and {W }. Lemma 2 (Binding Domains). After shattering, binding logical variables have identical domains. Proof. During shattering (see Section 3.2), when two atoms unify, binding two variables with partially overlapping domains, the atoms’ clauses are split up into clauses where the domain of the variables is identical, and clauses where the domains are disjoint and the atoms no longer unify. Definition 5 (Root Binding Class). A root variable is a variable that appears in all the atoms in its clause. A root binding class is an equivalence class of binding variables where all variables are root. Example 5. In the theory of Example 4, {X, Y, Z} is a root binding class and {W } is not. 4.2 Domain recursion We will now introduce the new domain recursion operator, starting with its preconditions. Definition 6. A theory allows for domain recursion when (i) the theory is shattered, (ii) the theory contains no independent subtheories and (iii) there exists a root binding class. From now on, we will denote with C the set of clauses of the theory at hand and with B a root binding class guaranteed to exist if C allows for domain recursion. Lemma 2 states that all variables in B have identical domains. We will denote the domain of these variables with D. The intuition behind the domain recursion operator is that it modifies D by making one element explicit: D = D ∪ {xD } with xD ∈ D . This explicit domain element is introduced by the S PLIT D / function, which splits clauses w.r.t. the new subdomain D and element xD . 5 Definition 7 (S PLIT D). For a clause c and given set of variables Vc ⊆ vars(c) with domain D, let S PLIT D(c, Vc ) = c, if Vc = ∅ S PLIT D(c1 , Vc \ {V }) ∪ S PLIT D(c2 , Vc \ {V }), if Vc = ∅ (7) where c1 = c ∧ (V = xD ) and c2 = c ∧ (V = xD ) ∧ (V ∈ D ) for some V ∈ Vc . For a set of clauses C and set of variables V with domain D: S PLIT D(C, V) = c∈C S PLIT D(c, V ∩ vars(c)). The domain recursion operator creates three sets of clauses: S PLIT D(C, B) = Cx ∪ Cv ∪ Cr , with Cx = {c ∧ (V = xD )|c ∈ C}, (8) V ∈B∩vars(c) Cv = {c ∧ (V = xD ) ∧ (V ∈ D )|c ∈ C}, (9) V ∈B∩vars(c) Cr = S PLIT D(C, B) \ Cx \ Cv . (10) Proposition 3. The conjunction of the domain recursion sets is equivalent to the original theory: c∈Cr c . c∈Cv c ∧ c∈C c ≡ c∈S PLIT D(C,B) c and therefore c∈C c ≡ c∈Cx c ∧ We will now show that these sets are independent and that their conjunction is decomposable. Theorem 4. The theories Cx , Cv and Cr are independent: Cx ⊥ Cv , Cx ⊥ Cr and Cv ⊥ Cr . ⊥ ⊥ ⊥ The proof of Theorem 4 relies on the following Lemma. Lemma 5. If the theory allows for domain recursion, all clauses and atoms contain the same number of variables from B: ∃n, ∀c ∈ C, ∀a ∈ atom(C) : | vars(c) ∩ B | = | vars(a) ∩ B | = n. c Proof. Denote with Cn the clauses in C that contain n logical variables from B and with Cn its compliment in C. If C is nonempty, there is a n > 0 for which Cn is nonempty. Then every atom in Cn contains exactly n variables from B (Definition 5). Since the theory contains no independent c c subtheories, there must be an atom a in Cn which unifies with an atom ac in Cn , or Cn is empty. After shattering, all unifications bind one variable from a to a single variable from ac . Because a contains exactly n variables from B, ac must also contain exactly n (Definition 4), and because B is c a root binding class, the clause of ac also contains exactly n, which contradicts the definition of Cn . c Therefore, Cn is empty, and because the variables in B are root, they also appear in all atoms. Proof of Theorem 4. From Lemma 5, all atoms in C contain the same number of variables from B. In Cx , these variables are all constrained to be equal to xD , while in Cv and Cr at least one variable is constrained to be different from xD . An attempt to unify an atom from Cx with an atom from Cv or Cr therefore creates an unsatisfiable set of constraints. Similarly, atoms from Cv and Cr cannot be unified. Finally, we extend the FO d-DNNF language proposed in Van den Broeck et al. [10] with a new node, the recursive decomposable conjunction ∧ r , and define the domain recursion compilation rule. Definition 8 ( ∧ r ). The FO d-DNNF node ∧ r (nx , nr , D, D , V) represents a decomposable conjunction between the d-DNNF nodes nx , nr and a d-DNNF node isomorphic to the ∧ r node itself. In particular, the isomorphic operand is identical to the node itself, except for the size of the domain of the variables in V, which becomes one smaller, going from D to D in the isomorphic operand. We have shown that the conjunction between sets Cx , Cv and Cr is decomposable (Theorem 4) and logically equivalent to the original theory (Proposition 3). Furthermore, Cv is identical to C, up to the constraints on the domain of the variables in B. This leads us to the following definition of domain recursion. Definition 9 (Domain Recursion). The domain recursion compilation rule compiles C into ∧ r (nx , nr , D, D , B), where nx , nr are the compiled circuits for Cx , Cr . The third set Cv is represented by the recursion on D, according to Definition 8. 6 nv Cr ∧r ¬ friends(x, X) ∨ friends(X, x), X = x ¬ friends(X, x) ∨ friends(x, X), X = x P erson ← P erson \ {x} nr nx ∨ Cx ¬ friends(x, x) ∨ friends(x, x) x ∈P erson x =x ¬ friends(x, x) friends(x, x) ∨ ∧ ¬ friends(x, x ) ¬ friends(x , x) ∧ friends(x, x ) friends(x , x) Figure 3: Circuit for the symmetric relation in Equation 3, rooted in a recursive conjunction. Example 6. Figure 3 shows the FO d-DNNF circuit for Equation 3. The theory is split up into three independent theories: Cr and Cx , shown in the Figure 3, and Cv = {¬ friends(X, Y ) ∨ friends(Y, X), X = x, Y = x}. The conjunction of these theories is equivalent to Equation 3. Theory Cv is identical to Equation 3, up to the inequality constraints on X and Y . Theorem 6. Given a function size, which maps domains to their size, the weighted first-order model count of a ∧ r (nx , nr , D, D , V) node is size(D) wmc( ∧ r (nx , nr , D, D , V), size) = wmc(nx , size)size(D) wmc(nr , size ∪{D → s}), s=0 (11) where size ∪{D → s} adds to the size function that the subdomain D has cardinality s. Proof. If C allows for domain recursion, due to Theorem 4, the weighted model count is wmc(C, size) = 1, if size(D) = 0 wmc(Cx ) · wmc(Cv , size ) · wmc(Cr , size ) if size(D) > 0 (12) where size = size ∪{D → size(D) − 1}. Theorem 7. The Independent Partial Grounding compilation rule is a special case of the domain recursion rule, where ∀c ∈ C : | vars(c) ∩ B | = 1 (and therefore Cr = ∅). 4.3 Completeness In this section, we introduce a class of models for which first-order knowledge compilation with domain recursion is complete. Definition 10 (k-WFOMC). The class of k-WFOMC consist of WFOMC theories with clauses that have up to k logical variables. A first completeness result is for 2-WFOMC, using the set of knowledge compilation rules CR 2 , which are the rules in CR 1 extended with domain recursion. Theorem 8 (Completeness for 2-WFOMC). First-order knowledge compilation using the CR 2 compilation rules is a complete domain lifted probabilistic inference algorithm for 2-WFOMC. Proof. From Lemma 1, after applying the CR 1 rules, the theory contains only atoms with dimension larger than or equal to two. From Definition 10, each clause has dimension smaller than or equal to two. Therefore, each logical variable in the theory is a root variable and according to Definition 5, every equivalence class of binding variables is a root binding class. Because of Lemma 1, the theory allows for domain recursion, which requires further compilation of two theories: Cx and Cr into nx and nr . Both have dimension smaller than 2 and can be lifted by CR 1 compilation rules. The properties of 2-WFOMC are a sufficient but not necessary condition for first-order knowledge compilation to be domain lifted. We can obtain a similar result for MLNs or parfactors by reducing them to a WFOMC problem. If an MLN contains only formulae with up to k logical variables, then its WFOMC representation will be in k-WFOMC. 7 This result for 2-WFOMC is not trivial. Van den Broeck et al. [10] showed in their experiments that counting first-order variable elimination (C-FOVE) [6] fails to lift the “Friends Smoker Drinker” problem, which is in 2-WFOMC. We will show in the next section that the CR 1 rules fail to lift the theory in Figure 4a, which is in 2-WFOMC. Note that there are also useful theories that are not in 2-WFOMC, such as those containing the transitive relation friends(X, Y ) ∧ friends(Y, Z) ⇒ friends(X, Z). 5 Empirical evaluation To complement the theoretical results of the previous section, we extended the WFOMC implementation1 with the domain recursion rule. We performed experiments with the theory in Figure 4a, which is a version of the friends and smokers model [11] extended with the symmetric relation of Equation 3. We evaluate the performance querying P(smokes(bob)) with increasing domain size, comparing our approach to the existing WFOMC implementation and its propositional counterpart, which first grounds the theory and then compiles it with the c2d compiler [13] to a propositional d-DNNF circuit. We did not compare to C-FOVE [6] because it cannot perform lifted inference on this model. 2 smokes(X) ∧ friends(X, Y ) ⇒ smokes(Y ) friends(X, Y ) ⇒ friends(Y, X). Runtime [s] Propositional inference quickly becomes intractable when there are more than 20 people. The lifted inference algorithms scale much better. The CR 1 rules can exploit some regularities in the model. For example, they eliminate all the smokes(X) atoms from the theory. They do, however, resort to grounding at a later stage of the compilation process. With the domain recursion rule, there is no need for grounding. This advantage is clear in the experiments, our approach having an almost constant inference time in this range of domains sizes. Note that the runtimes for c2d include compilation and evaluation of the circuit, whereas the WFOMC runtimes only represent evaluation of the FO d-DNNF. After all, propositional compilation depends on the domain size but first-order compilation does not. First-order compilation takes a constant two seconds for both rule sets. 10000 1000 100 10 1 0.1 0.01 c2d WFOMC - CR1 WFOMC - CR2 10 20 30 40 50 60 Number of People 70 80 (b) Evaluation Runtime (a) MLN Model Figure 4: Symmetric friends and smokers experiment, comparing propositional knowledge compilation (c2d) to WFOMC using compilation rules CR 1 and CR 2 (which includes domain recursion). 6 Conclusions We proposed a definition of complete domain lifted probabilistic inference w.r.t. classes of probabilistic logic models. This definition considers algorithms to be lifted if they are polynomial in the size of logical variable domains. Existing first-order knowledge compilation turns out not to admit an intuitive completeness result. Therefore, we generalized the existing Independent Partial Grounding compilation rule to the domain recursion rule. With this one extra rule, we showed that first-order knowledge compilation is complete for a significant class of probabilistic logic models, where the WFOMC representation has up to two logical variables per clause. Acknowledgments The author would like to thank Luc De Raedt, Jesse Davis and the anonymous reviewers for valuable feedback. This work was supported by the Research Foundation-Flanders (FWO-Vlaanderen). 1 http://dtai.cs.kuleuven.be/wfomc/ 8 References [1] Lise Getoor and Ben Taskar, editors. An Introduction to Statistical Relational Learning. MIT Press, 2007. [2] Luc De Raedt, Paolo Frasconi, Kristian Kersting, and Stephen Muggleton, editors. Probabilistic inductive logic programming: theory and applications. Springer-Verlag, Berlin, Heidelberg, 2008. [3] Daan Fierens, Guy Van den Broeck, Ingo Thon, Bernd Gutmann, and Luc De Raedt. Inference in probabilistic logic programs using weighted CNF’s. In Proceedings of UAI, pages 256–265, 2011. [4] David Poole. First-order probabilistic inference. In Proceedings of IJCAI, pages 985–991, 2003. [5] Rodrigo de Salvo Braz, Eyal Amir, and Dan Roth. Lifted first-order probabilistic inference. In Proceedings of IJCAI, pages 1319–1325, 2005. [6] Brian Milch, Luke S. Zettlemoyer, Kristian Kersting, Michael Haimes, and Leslie Pack Kaelbling. Lifted Probabilistic Inference with Counting Formulas. In Proceedings of AAAI, pages 1062–1068, 2008. [7] Vibhav Gogate and Pedro Domingos. Exploiting Logical Structure in Lifted Probabilistic Inference. In Proceedings of StarAI, 2010. [8] Abhay Jha, Vibhav Gogate, Alexandra Meliou, and Dan Suciu. Lifted Inference Seen from the Other Side: The Tractable Features. In Proceedings of NIPS, 2010. [9] Vibhav Gogate and Pedro Domingos. Probabilistic theorem proving. In Proceedings of UAI, pages 256–265, 2011. [10] Guy Van den Broeck, Nima Taghipour, Wannes Meert, Jesse Davis, and Luc De Raedt. Lifted Probabilistic Inference by First-Order Knowledge Compilation. In Proceedings of IJCAI, pages 2178–2185, 2011. [11] Parag Singla and Pedro Domingos. Lifted first-order belief propagation. In Proceedings of AAAI, pages 1094–1099, 2008. [12] Matthew Richardson and Pedro Domingos. Markov logic networks. Machine Learning, 62(1): 107–136, 2006. [13] Adnan Darwiche. New advances in compiling CNF to decomposable negation normal form. In Proceedings of ECAI, pages 328–332, 2004. 9

6 0.33226636 65 nips-2011-Convergent Fitted Value Iteration with Linear Function Approximation

7 0.32489166 3 nips-2011-A Collaborative Mechanism for Crowdsourcing Prediction Problems

8 0.32003716 204 nips-2011-Online Learning: Stochastic, Constrained, and Smoothed Adversaries

9 0.28853753 220 nips-2011-Prediction strategies without loss

10 0.26391152 108 nips-2011-Greedy Algorithms for Structurally Constrained High Dimensional Problems

11 0.2624675 202 nips-2011-On the Universality of Online Mirror Descent

12 0.26210958 27 nips-2011-Advice Refinement in Knowledge-Based SVMs

13 0.26141155 17 nips-2011-Accelerated Adaptive Markov Chain for Partition Function Computation

14 0.25760964 69 nips-2011-Differentially Private M-Estimators

15 0.24897996 40 nips-2011-Automated Refinement of Bayes Networks' Parameters based on Test Ordering Constraints

16 0.24854925 292 nips-2011-Two is better than one: distinct roles for familiarity and recollection in retrieving palimpsest memories

17 0.23690681 256 nips-2011-Solving Decision Problems with Limited Information

18 0.2325706 122 nips-2011-How Do Humans Teach: On Curriculum Learning and Teaching Dimension

19 0.22405796 6 nips-2011-A Global Structural EM Algorithm for a Model of Cancer Progression

20 0.2194225 34 nips-2011-An Unsupervised Decontamination Procedure For Improving The Reliability Of Human Judgments


similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(0, 0.014), (4, 0.025), (20, 0.026), (23, 0.368), (26, 0.019), (31, 0.076), (33, 0.027), (43, 0.025), (45, 0.112), (57, 0.039), (65, 0.014), (74, 0.123), (83, 0.017), (99, 0.018)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.81176984 196 nips-2011-On Strategy Stitching in Large Extensive Form Multiplayer Games

Author: Richard G. Gibson, Duane Szafron

Abstract: Computing a good strategy in a large extensive form game often demands an extraordinary amount of computer memory, necessitating the use of abstraction to reduce the game size. Typically, strategies from abstract games perform better in the real game as the granularity of abstraction is increased. This paper investigates two techniques for stitching a base strategy in a coarse abstraction of the full game tree, to expert strategies in fine abstractions of smaller subtrees. We provide a general framework for creating static experts, an approach that generalizes some previous strategy stitching efforts. In addition, we show that static experts can create strong agents for both 2-player and 3-player Leduc and Limit Texas Hold’em poker, and that a specific class of static experts can be preferred among a number of alternatives. Furthermore, we describe a poker agent that used static experts and won the 3-player events of the 2010 Annual Computer Poker Competition.

2 0.68986803 62 nips-2011-Continuous-Time Regression Models for Longitudinal Networks

Author: Duy Q. Vu, David Hunter, Padhraic Smyth, Arthur U. Asuncion

Abstract: The development of statistical models for continuous-time longitudinal network data is of increasing interest in machine learning and social science. Leveraging ideas from survival and event history analysis, we introduce a continuous-time regression modeling framework for network event data that can incorporate both time-dependent network statistics and time-varying regression coefficients. We also develop an efficient inference scheme that allows our approach to scale to large networks. On synthetic and real-world data, empirical results demonstrate that the proposed inference approach can accurately estimate the coefficients of the regression model, which is useful for interpreting the evolution of the network; furthermore, the learned model has systematically better predictive performance compared to standard baseline methods.

3 0.54246038 16 nips-2011-A reinterpretation of the policy oscillation phenomenon in approximate policy iteration

Author: Paul Wagner

Abstract: A majority of approximate dynamic programming approaches to the reinforcement learning problem can be categorized into greedy value function methods and value-based policy gradient methods. The former approach, although fast, is well known to be susceptible to the policy oscillation phenomenon. We take a fresh view to this phenomenon by casting a considerable subset of the former approach as a limiting special case of the latter. We explain the phenomenon in terms of this view and illustrate the underlying mechanism with artificial examples. We also use it to derive the constrained natural actor-critic algorithm that can interpolate between the aforementioned approaches. In addition, it has been suggested in the literature that the oscillation phenomenon might be subtly connected to the grossly suboptimal performance in the Tetris benchmark problem of all attempted approximate dynamic programming methods. We report empirical evidence against such a connection and in favor of an alternative explanation. Finally, we report scores in the Tetris problem that improve on existing dynamic programming based results. 1

4 0.46056074 155 nips-2011-Learning to Agglomerate Superpixel Hierarchies

Author: Viren Jain, Srinivas C. Turaga, K Briggman, Moritz N. Helmstaedter, Winfried Denk, H. S. Seung

Abstract: An agglomerative clustering algorithm merges the most similar pair of clusters at every iteration. The function that evaluates similarity is traditionally handdesigned, but there has been recent interest in supervised or semisupervised settings in which ground-truth clustered data is available for training. Here we show how to train a similarity function by regarding it as the action-value function of a reinforcement learning problem. We apply this general method to segment images by clustering superpixels, an application that we call Learning to Agglomerate Superpixel Hierarchies (LASH). When applied to a challenging dataset of brain images from serial electron microscopy, LASH dramatically improved segmentation accuracy when clustering supervoxels generated by state of the boundary detection algorithms. The naive strategy of directly training only supervoxel similarities and applying single linkage clustering produced less improvement. 1

5 0.45840895 68 nips-2011-Demixed Principal Component Analysis

Author: Wieland Brendel, Ranulfo Romo, Christian K. Machens

Abstract: In many experiments, the data points collected live in high-dimensional observation spaces, yet can be assigned a set of labels or parameters. In electrophysiological recordings, for instance, the responses of populations of neurons generally depend on mixtures of experimentally controlled parameters. The heterogeneity and diversity of these parameter dependencies can make visualization and interpretation of such data extremely difficult. Standard dimensionality reduction techniques such as principal component analysis (PCA) can provide a succinct and complete description of the data, but the description is constructed independent of the relevant task variables and is often hard to interpret. Here, we start with the assumption that a particularly informative description is one that reveals the dependency of the high-dimensional data on the individual parameters. We show how to modify the loss function of PCA so that the principal components seek to capture both the maximum amount of variance about the data, while also depending on a minimum number of parameters. We call this method demixed principal component analysis (dPCA) as the principal components here segregate the parameter dependencies. We phrase the problem as a probabilistic graphical model, and present a fast Expectation-Maximization (EM) algorithm. We demonstrate the use of this algorithm for electrophysiological data and show that it serves to demix the parameter-dependence of a neural population response. 1

6 0.43655944 259 nips-2011-Sparse Estimation with Structured Dictionaries

7 0.43266219 218 nips-2011-Predicting Dynamic Difficulty

8 0.43165457 104 nips-2011-Generalized Beta Mixtures of Gaussians

9 0.42147753 276 nips-2011-Structured sparse coding via lateral inhibition

10 0.41708168 57 nips-2011-Comparative Analysis of Viterbi Training and Maximum Likelihood Estimation for HMMs

11 0.41237533 149 nips-2011-Learning Sparse Representations of High Dimensional Data on Large Scale Dictionaries

12 0.41025338 283 nips-2011-The Fixed Points of Off-Policy TD

13 0.41002673 258 nips-2011-Sparse Bayesian Multi-Task Learning

14 0.40757018 265 nips-2011-Sparse recovery by thresholded non-negative least squares

15 0.40744963 186 nips-2011-Noise Thresholds for Spectral Clustering

16 0.40592271 79 nips-2011-Efficient Offline Communication Policies for Factored Multiagent POMDPs

17 0.40483388 169 nips-2011-Maximum Margin Multi-Label Structured Prediction

18 0.40465951 197 nips-2011-On Tracking The Partition Function

19 0.40451941 98 nips-2011-From Bandits to Experts: On the Value of Side-Observations

20 0.40406218 271 nips-2011-Statistical Tests for Optimization Efficiency