nips nips2004 nips2004-171 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Xiang Yan, Persi Diaconis, Paat Rusmevichientong, Benjamin V. Roy
Abstract: In this paper, we use the rollout method for policy improvement to analyze a version of Klondike solitaire. This version, sometimes called thoughtful solitaire, has all cards revealed to the player, but then follows the usual Klondike rules. A strategy that we establish, using iterated rollouts, wins about twice as many games on average as an expert human player does. 1
Reference: text
sentIndex sentText sentNum sentScore
1 edu Abstract In this paper, we use the rollout method for policy improvement to analyze a version of Klondike solitaire. [sent-5, score-0.438]
2 This version, sometimes called thoughtful solitaire, has all cards revealed to the player, but then follows the usual Klondike rules. [sent-6, score-0.431]
3 A strategy that we establish, using iterated rollouts, wins about twice as many games on average as an expert human player does. [sent-7, score-0.302]
4 For discounted or average reward Markov decision problems with n states and two possible actions per state, the tightest known worst-case upper bound in terms of n on the number of iterations taken to find an optimal policy is O(2n /n) [9]. [sent-9, score-0.183]
5 In particular, a policy is represented by a table with one action per state and each iteration improves the policy by updating each entry of this table. [sent-14, score-0.224]
6 In such large problems, one might resort to a suboptimal heuristic policy, taking the form of an algorithm that accepts a state as input and generates an action as output. [sent-15, score-0.148]
7 An interesting recent development in dynamic programming is the rollout method. [sent-16, score-0.357]
8 Pioneered by Tesauro and Galperin [13, 2], the rollout method leverages the policy improvement concept to amplify the performance of any given heuristic. [sent-17, score-0.423]
9 Unlike the conventional policy improvement algorithm, which computes an optimal policy off-line so that it may later be used in decision-making, the rollout method performs its computations on-line at the time when a decision is to be made. [sent-18, score-0.518]
10 When making a decision, rather than applying the heuristic policy directly, the rollout method computes an action that would result from an iteration of policy improvement applied to the heuristic policy. [sent-19, score-0.685]
11 The way in which actions are generated by the rollout method may be considered an alternative heuristic that improves on the original. [sent-21, score-0.425]
12 One might consider applying the rollout method to this new heuristic. [sent-22, score-0.318]
13 For this reason, prior applications of the rollout method have involved only one iteration [3, 4, 5, 6, 8, 11, 12, 13]. [sent-27, score-0.336]
14 A second iteration of the rollout method would have been infeasible – requiring about six orders of magnitude more time per move. [sent-29, score-0.336]
15 In this paper, we apply the rollout method to a version of solitaire, modeled as a deterministic Markov decision problem with over 52! [sent-30, score-0.347]
16 Our study represents an important contribution both to the study of the rollout method and to the study of solitaire. [sent-34, score-0.318]
17 According to Parlett [10], solitaire came into existence when fortune-telling with cards gained popularity in the eighteenth century. [sent-40, score-0.471]
18 Many variations of solitaire exist today, such as Klondike, Freecell, and Carpet. [sent-41, score-0.153]
19 Klondike is played with a standard deck of cards: there are four suits (Spades, Clubs, Hearts, and Diamonds) each made up of thirteen cards ranked 1 through 13: Ace, 2, 3, . [sent-43, score-0.377]
20 During the game, each card resides in one of thirteen stacks2 : the pile, the talon, four suit stacks and seven build stacks. [sent-47, score-0.864]
21 Each suit stack corresponds to a particular suit and build stacks are labeled 1 through 7. [sent-48, score-1.168]
22 At the beginning of the game, cards are dealt so that there is one card in the first build stack, two cards in the second build stack, . [sent-49, score-1.344]
23 The top card on each of the seven build stacks is turned face-up while the rest of the cards in the build stacks face down. [sent-53, score-1.247]
24 The goal of the game is to move all cards into the suit stacks, aces first, then two’s, and so on, with each suit stack evolving as an ordered increasing arrangement of cards of the same suit. [sent-56, score-1.812]
25 In some solitaire literature, stacks are referred to as piles. [sent-59, score-0.237]
26 We will study a version of solitaire in which the identity of each card at each position is revealed to the player at the beginning of the game but the usual Klondike rules still apply. [sent-60, score-0.814]
27 This version is played by a number of serious solitaire players as a much more difficult version than standard Klondike. [sent-61, score-0.238]
28 We call this game thoughtful solitaire and now spell out the rules. [sent-63, score-0.351]
29 On each turn, the player can move cards from one stack to another in the following manner: • Face-up cards of a build stack, called a card block, can be moved to the top of another build stack provided that the build stack to which the block is being moved accepts the block. [sent-64, score-3.802]
30 Note that all face-up cards on the source stack must be moved together. [sent-65, score-1.005]
31 After the move, these cards would then become the top cards of the stack to which they are moved, and their ordering is preserved. [sent-66, score-1.26]
32 The card originally immediately beneath the card block, now the top card in its stack, is turned faceup. [sent-67, score-1.299]
33 In the event that all cards in the source stack are moved, the player has an empty stack. [sent-68, score-1.055]
34 3 • The top face-up card of a build stack can be moved to the top of a suit stack, provided that the suit stack accepts the card. [sent-69, score-2.3]
35 • The top card of a suit stack can be moved to the top of a build stack, provided that the build stack accepts the card. [sent-70, score-2.267]
36 • If the pile is not empty, a move can deal its top three cards to the talon, which maintains its cards in a first-in-last-out order. [sent-71, score-0.911]
37 If the pile becomes empty, the player can redeal all the cards on the talon back to the pile in one card move. [sent-72, score-1.206]
38 • A card on the top of the talon can be moved to the top of a build stack or a suit stack, provided that the stack to which the card is being moved accepts the card. [sent-75, score-2.773]
39 3 It would seem to some that since the identity of all cards is revealed to the player, whether a card is face-up or face-down is irrelevant. [sent-76, score-0.751]
40 We retain this property of cards as it is still important in describing the rules and formulating our strategy. [sent-77, score-0.318]
41 • A build stack can only accept an incoming card block if the top card on the build stack is adjacent to and braided with the bottom card of the block. [sent-78, score-2.848]
42 A card is adjacent to another card of rank r if it is of rank r + 1. [sent-79, score-0.843]
43 A card is braided with a card of suit s if its suit is of a color different from s. [sent-80, score-1.212]
44 Additionally, if a build stack is empty, it can only accept a card block whose bottom card is a King. [sent-81, score-1.62]
45 • A suit stack can only accept an incoming card of its corresponding suit. [sent-82, score-1.221]
46 If a suit stack is empty, it can only accept an Ace. [sent-83, score-0.79]
47 If it is not empty, the incoming card must be adjacent to the current top card of the suit stack. [sent-84, score-1.074]
48 As stated earlier, the objective is to end up with all cards on suit stacks. [sent-85, score-0.498]
49 3 Expert Play We were introduced to thoughtful solitaire by a senior American mathematician (former president of the American Mathematical Society and indeed a famous combinatorialist) who had spent a number of years studying the game. [sent-87, score-0.247]
50 He finds this version of solitaire much more thought-provoking and challenging than the standard Klondike. [sent-88, score-0.168]
51 For instance, while the latter is usually played quickly, our esteemed expert averages about 20 minutes for each game of thoughtful solitaire. [sent-89, score-0.316]
52 With this background, it is natural to wonder how well an optimal player can perform at thoughtful solitaire. [sent-92, score-0.203]
53 If all cards are on suit stacks, declare victory and terminate. [sent-102, score-0.564]
54 If the new card configuration repeats a previous one, declare loss and terminate 4 . [sent-103, score-0.449]
55 We will first describe a heuristic strategy for selecting a legal move based on a card configuration. [sent-106, score-0.754]
56 1 A Heuristic Strategy Our heuristic strategy is based on part of the Microsoft Windows Klondike scoring system: • The player starts the game with an initial score of 0. [sent-109, score-0.422]
57 4 One straight-forward way to determine if a card configuration has previously occurred is to store all encountered card configurations. [sent-110, score-0.828]
58 Instead of doing so, however, we notice that there are three kinds of moves that could lead us into an infinite loop: pile-talon moves, moves that could juggle a card block between two build stacks, and moves that could juggle a card block between a build stack and a suit stack. [sent-111, score-2.234]
59 For the first kind, we record if any card move other than a pile-talon move has occurred since the last redeal. [sent-114, score-0.684]
60 • Whenever a card is moved from a build stack to a suit stack, the player gains 5 points. [sent-116, score-1.537]
61 • Whenever a card is moved from the talon to a build stack, the player gains 5 points. [sent-117, score-0.909]
62 • Whenever a card is moved from a suit stack to a build stack, the player loses 10 points. [sent-118, score-1.537]
63 In our heuristic strategy, we assign a score to each card move based on the above scoring system. [sent-119, score-0.712]
64 We assign the score zero to any moves not covered by the above rules. [sent-120, score-0.161]
65 The player has incentive to move cards from the talon to a build stack and from a build stack to a suit stack. [sent-123, score-2.319]
66 One important element that the heuristic fails to capture, however, is what move to make when multiple moves maximize the score. [sent-124, score-0.289]
67 – If the move empties a stack, we assign this move a priority of 1. [sent-127, score-0.387]
68 • If the card move is from the talon to a build stack, one of the following three assignments of priority occurs: – If the card being moved is not a King, we assign the move priority 1. [sent-128, score-1.683]
69 – If the card being moved is a King and its matching Queen is in the pile, in the talon, in a suit stack, or is face-up in a build stack, we assign the move priority 1. [sent-129, score-1.103]
70 – If the card being moved is a King and its matching Queen is face-down in a build stack, we assign the move priority -1. [sent-130, score-0.923]
71 • For card moves not covered by the description above, we assign them a priority of 0. [sent-131, score-0.623]
72 In addition to introducing priorities, we modify the Windows Klondike scoring system further by adding the following change: in a card move, if the card being moved is a King and its matching Queen is face-down in a build stack, we assign the move a score of 0. [sent-132, score-1.308]
73 Note that given our assignment of scores and priorities, we practically disable card moves from a suit stack to a build stack. [sent-133, score-1.446]
74 Because such moves have a negative score and a card move from the pile to the talon or from the talon to the pile has zero score and is almost always available, our strategy would always choose the pile-talon move over the moves from a suit stack to a build stack. [sent-134, score-2.365]
75 In the case when multiple moves equal in priority maximize the score, we randomly select a move among them. [sent-135, score-0.309]
76 The introduction of priority improves our original game-playing strategy in two ways: when we encounter a situation where we can move either one of two blocks on two separate build stacks atop the top card of a third build stack, we prefer moving the block whose stack has more face-down cards. [sent-136, score-1.754]
77 Intuitively, such a move would strive to balance the number of face-down cards in stacks. [sent-137, score-0.453]
78 The second way in which our prioritization scheme helps is that we are more deliberate in which King to select to enter an empty build stack. [sent-139, score-0.235]
79 For instance, consider a situation where the King of Hearts and the King of Spades, both on the pile, are vying for an empty build stack and there is a face-up Queen of Diamonds on a build stack. [sent-140, score-0.922]
80 We should certainly move the King of Spades to the empty build stack so that the Queen of Diamonds can be moved on top of it. [sent-141, score-1.054]
81 2 Rollouts Consider a strategy h that maps a card configuration x to a legal move h(x). [sent-144, score-0.679]
82 In this section, we will discuss the rollout method as a procedure for amplifying the performance of any strategy. [sent-146, score-0.331]
83 Given a strategy h, this procedure generates an improved strategy h , called a rollout strategy. [sent-147, score-0.493]
84 This idea was originally proposed by Tesauro and Galperin [13] and builds on the policy improvement algorithm of dynamic programming [1, 7]. [sent-148, score-0.167]
85 A rollout strategy would make a move h (x), determined as follows: 1. [sent-151, score-0.534]
86 For each legal move a, simulate the remainder of the game, taking move a and then employing strategy h thereafter. [sent-152, score-0.416]
87 If any of these simulations leads to victory, choose one of them randomly and let h (x) be the corresponding move a5 . [sent-154, score-0.135]
88 We can then iterate this procedure to generate a further improved strategy h that is a rollout strategy relative to h . [sent-157, score-0.493]
89 5 Results We implemented in Java the heuristic strategy and the procedure for computing rollout strategies. [sent-161, score-0.487]
90 We randomly generated a large number of games and played them with our algorithms in an effort to approximate the success probability with the percentage of games actually won. [sent-163, score-0.151]
91 For the original heuristic and 1 through 3 rollout iterations, we managed to achieve confidence bounds of [-1. [sent-165, score-0.393]
92 For 4 and 5 rollout iterations, due to time constraints, we simulated fewer games and obtained weaker confidence bounds. [sent-168, score-0.361]
93 Interestingly, however, after 5 rollout iterations, the resulting strategy wins almost twice as frequently as our esteemed mathematician. [sent-169, score-0.439]
94 Player Human expert heuristic 1 rollout 2 rollouts 3 rollouts 4 rollouts 5 rollouts 6 Success Rate 36. [sent-173, score-0.704]
95 13 seconds 1 minute 36 seconds 18 minutes 7 seconds 1 hour 45 minutes 99% Confidence Bounds ±2. [sent-183, score-0.162]
96 34% Future Challenges One limitation of our rollout method lies in its recursive nature. [sent-190, score-0.318]
97 One possible direction for further exploration would be to compute a value function, mapping the state of the game to an estimate of whether or not the game can be won. [sent-192, score-0.221]
98 Certainly, this function could not be represented exactly, but we could try approximating it in terms of a linear combination of features of the game state, as is common in the approximate dynamic programming literature [2]. [sent-193, score-0.143]
99 We have also attempted proving an upper bound for the success rate of thoughtful solitaire by enumerating sets of initial card configurations that would force loss. [sent-194, score-0.7]
100 If the success rate bound is improved and we are able to run additional rollout iterations, we may produce a verifiable near-optimal strategy for thoughtful solitaire. [sent-198, score-0.532]
wordName wordTfidf (topN-words)
[('stack', 0.577), ('card', 0.414), ('cards', 0.318), ('rollout', 0.318), ('suit', 0.18), ('solitaire', 0.153), ('build', 0.147), ('move', 0.135), ('talon', 0.129), ('moved', 0.11), ('player', 0.109), ('pile', 0.106), ('game', 0.104), ('klondike', 0.094), ('thoughtful', 0.094), ('stacks', 0.084), ('priority', 0.082), ('strategy', 0.081), ('policy', 0.081), ('moves', 0.079), ('heuristic', 0.075), ('king', 0.075), ('rollouts', 0.071), ('empty', 0.051), ('queen', 0.049), ('legal', 0.049), ('accepts', 0.047), ('games', 0.043), ('played', 0.039), ('iterations', 0.035), ('disable', 0.035), ('galperin', 0.035), ('priorities', 0.035), ('spades', 0.035), ('block', 0.035), ('tesauro', 0.035), ('declare', 0.035), ('assign', 0.035), ('top', 0.034), ('score', 0.034), ('accept', 0.033), ('bertsekas', 0.031), ('victory', 0.031), ('seconds', 0.03), ('minutes', 0.028), ('diamonds', 0.028), ('expert', 0.027), ('iterated', 0.026), ('tightest', 0.026), ('con', 0.026), ('success', 0.026), ('play', 0.025), ('improvement', 0.024), ('bertsimas', 0.024), ('braided', 0.024), ('esteemed', 0.024), ('hearts', 0.024), ('juggle', 0.024), ('parlett', 0.024), ('prioritization', 0.024), ('redeal', 0.024), ('originally', 0.023), ('winning', 0.022), ('dence', 0.021), ('dynamic', 0.021), ('thirteen', 0.02), ('backgammon', 0.02), ('scoring', 0.019), ('revealed', 0.019), ('seven', 0.019), ('heuristics', 0.019), ('java', 0.019), ('programming', 0.018), ('offers', 0.018), ('guration', 0.018), ('improves', 0.018), ('windows', 0.018), ('iteration', 0.018), ('win', 0.017), ('incoming', 0.017), ('wins', 0.016), ('players', 0.016), ('routing', 0.016), ('hour', 0.016), ('simulate', 0.016), ('adjacent', 0.015), ('version', 0.015), ('markov', 0.014), ('actions', 0.014), ('decision', 0.014), ('practically', 0.014), ('whenever', 0.013), ('management', 0.013), ('covered', 0.013), ('procedure', 0.013), ('ordering', 0.013), ('select', 0.013), ('bound', 0.013), ('state', 0.013), ('action', 0.013)]
simIndex simValue paperId paperTitle
same-paper 1 1.0000004 171 nips-2004-Solitaire: Man Versus Machine
Author: Xiang Yan, Persi Diaconis, Paat Rusmevichientong, Benjamin V. Roy
Abstract: In this paper, we use the rollout method for policy improvement to analyze a version of Klondike solitaire. This version, sometimes called thoughtful solitaire, has all cards revealed to the player, but then follows the usual Klondike rules. A strategy that we establish, using iterated rollouts, wins about twice as many games on average as an expert human player does. 1
2 0.074698329 2 nips-2004-A Direct Formulation for Sparse PCA Using Semidefinite Programming
Author: Alexandre D'aspremont, Laurent E. Ghaoui, Michael I. Jordan, Gert R. Lanckriet
Abstract: We examine the problem of approximating, in the Frobenius-norm sense, a positive, semidefinite symmetric matrix by a rank-one matrix, with an upper bound on the cardinality of its eigenvector. The problem arises in the decomposition of a covariance matrix into sparse factors, and has wide applications ranging from biology to finance. We use a modification of the classical variational representation of the largest eigenvalue of a symmetric matrix, where cardinality is constrained, and derive a semidefinite programming based relaxation for our problem. 1
3 0.070520513 129 nips-2004-New Criteria and a New Algorithm for Learning in Multi-Agent Systems
Author: Rob Powers, Yoav Shoham
Abstract: We propose a new set of criteria for learning algorithms in multi-agent systems, one that is more stringent and (we argue) better justified than previous proposed criteria. Our criteria, which apply most straightforwardly in repeated games with average rewards, consist of three requirements: (a) against a specified class of opponents (this class is a parameter of the criterion) the algorithm yield a payoff that approaches the payoff of the best response, (b) against other opponents the algorithm’s payoff at least approach (and possibly exceed) the security level payoff (or maximin value), and (c) subject to these requirements, the algorithm achieve a close to optimal payoff in self-play. We furthermore require that these average payoffs be achieved quickly. We then present a novel algorithm, and show that it meets these new criteria for a particular parameter class, the class of stationary opponents. Finally, we show that the algorithm is effective not only in theory, but also empirically. Using a recently introduced comprehensive game theoretic test suite, we show that the algorithm almost universally outperforms previous learning algorithms. 1
4 0.053907841 48 nips-2004-Convergence and No-Regret in Multiagent Learning
Author: Michael Bowling
Abstract: Learning in a multiagent system is a challenging problem due to two key factors. First, if other agents are simultaneously learning then the environment is no longer stationary, thus undermining convergence guarantees. Second, learning is often susceptible to deception, where the other agents may be able to exploit a learner’s particular dynamics. In the worst case, this could result in poorer performance than if the agent was not learning at all. These challenges are identifiable in the two most common evaluation criteria for multiagent learning algorithms: convergence and regret. Algorithms focusing on convergence or regret in isolation are numerous. In this paper, we seek to address both criteria in a single algorithm by introducing GIGA-WoLF, a learning algorithm for normalform games. We prove the algorithm guarantees at most zero average regret, while demonstrating the algorithm converges in many situations of self-play. We prove convergence in a limited setting and give empirical results in a wider variety of situations. These results also suggest a third new learning criterion combining convergence and regret, which we call negative non-convergence regret (NNR). 1
5 0.051142946 122 nips-2004-Modelling Uncertainty in the Game of Go
Author: David H. Stern, Thore Graepel, David MacKay
Abstract: Go is an ancient oriental game whose complexity has defeated attempts to automate it. We suggest using probability in a Bayesian sense to model the uncertainty arising from the vast complexity of the game tree. We present a simple conditional Markov random field model for predicting the pointwise territory outcome of a game. The topology of the model reflects the spatial structure of the Go board. We describe a version of the Swendsen-Wang process for sampling from the model during learning and apply loopy belief propagation for rapid inference and prediction. The model is trained on several hundred records of professional games. Our experimental results indicate that the model successfully learns to predict territory despite its simplicity. 1
6 0.048490733 64 nips-2004-Experts in a Markov Decision Process
7 0.044855375 39 nips-2004-Coarticulation in Markov Decision Processes
8 0.042425685 202 nips-2004-VDCBPI: an Approximate Scalable Algorithm for Large POMDPs
9 0.03680503 1 nips-2004-A Cost-Shaping LP for Bellman Error Minimization with Performance Guarantees
10 0.035723977 123 nips-2004-Multi-agent Cooperation in Diverse Population Games
11 0.031960189 65 nips-2004-Exploration-Exploitation Tradeoffs for Experts Algorithms in Reactive Environments
12 0.030091926 126 nips-2004-Nearly Tight Bounds for the Continuum-Armed Bandit Problem
13 0.02968272 88 nips-2004-Intrinsically Motivated Reinforcement Learning
14 0.024333848 24 nips-2004-Approximately Efficient Online Mechanism Design
15 0.022875881 147 nips-2004-Planning for Markov Decision Processes with Sparse Stochasticity
16 0.021903781 139 nips-2004-Optimal Aggregation of Classifiers and Boosting Maps in Functional Magnetic Resonance Imaging
17 0.020974668 75 nips-2004-Heuristics for Ordering Cue Search in Decision Making
18 0.020346705 13 nips-2004-A Three Tiered Approach for Articulated Object Action Modeling and Recognition
19 0.020103164 102 nips-2004-Learning first-order Markov models for control
20 0.018503519 174 nips-2004-Spike Sorting: Bayesian Clustering of Non-Stationary Data
topicId topicWeight
[(0, -0.061), (1, -0.004), (2, 0.09), (3, -0.024), (4, -0.051), (5, 0.064), (6, -0.019), (7, -0.007), (8, -0.022), (9, -0.01), (10, -0.022), (11, -0.007), (12, -0.011), (13, 0.022), (14, 0.0), (15, 0.019), (16, 0.017), (17, 0.054), (18, -0.018), (19, -0.026), (20, 0.021), (21, -0.008), (22, 0.031), (23, -0.0), (24, 0.075), (25, -0.001), (26, -0.01), (27, -0.004), (28, -0.029), (29, -0.014), (30, 0.032), (31, 0.014), (32, 0.033), (33, 0.087), (34, 0.036), (35, 0.037), (36, -0.091), (37, 0.107), (38, 0.026), (39, 0.071), (40, -0.073), (41, -0.072), (42, -0.138), (43, 0.075), (44, -0.045), (45, -0.026), (46, -0.0), (47, 0.114), (48, -0.054), (49, -0.18)]
simIndex simValue paperId paperTitle
same-paper 1 0.9647401 171 nips-2004-Solitaire: Man Versus Machine
Author: Xiang Yan, Persi Diaconis, Paat Rusmevichientong, Benjamin V. Roy
Abstract: In this paper, we use the rollout method for policy improvement to analyze a version of Klondike solitaire. This version, sometimes called thoughtful solitaire, has all cards revealed to the player, but then follows the usual Klondike rules. A strategy that we establish, using iterated rollouts, wins about twice as many games on average as an expert human player does. 1
2 0.61334652 129 nips-2004-New Criteria and a New Algorithm for Learning in Multi-Agent Systems
Author: Rob Powers, Yoav Shoham
Abstract: We propose a new set of criteria for learning algorithms in multi-agent systems, one that is more stringent and (we argue) better justified than previous proposed criteria. Our criteria, which apply most straightforwardly in repeated games with average rewards, consist of three requirements: (a) against a specified class of opponents (this class is a parameter of the criterion) the algorithm yield a payoff that approaches the payoff of the best response, (b) against other opponents the algorithm’s payoff at least approach (and possibly exceed) the security level payoff (or maximin value), and (c) subject to these requirements, the algorithm achieve a close to optimal payoff in self-play. We furthermore require that these average payoffs be achieved quickly. We then present a novel algorithm, and show that it meets these new criteria for a particular parameter class, the class of stationary opponents. Finally, we show that the algorithm is effective not only in theory, but also empirically. Using a recently introduced comprehensive game theoretic test suite, we show that the algorithm almost universally outperforms previous learning algorithms. 1
3 0.56993431 2 nips-2004-A Direct Formulation for Sparse PCA Using Semidefinite Programming
Author: Alexandre D'aspremont, Laurent E. Ghaoui, Michael I. Jordan, Gert R. Lanckriet
Abstract: We examine the problem of approximating, in the Frobenius-norm sense, a positive, semidefinite symmetric matrix by a rank-one matrix, with an upper bound on the cardinality of its eigenvector. The problem arises in the decomposition of a covariance matrix into sparse factors, and has wide applications ranging from biology to finance. We use a modification of the classical variational representation of the largest eigenvalue of a symmetric matrix, where cardinality is constrained, and derive a semidefinite programming based relaxation for our problem. 1
4 0.43470877 123 nips-2004-Multi-agent Cooperation in Diverse Population Games
Author: K. Wong, S. W. Lim, Z. Gao
Abstract: We consider multi-agent systems whose agents compete for resources by striving to be in the minority group. The agents adapt to the environment by reinforcement learning of the preferences of the policies they hold. Diversity of preferences of policies is introduced by adding random biases to the initial cumulative payoffs of their policies. We explain and provide evidence that agent cooperation becomes increasingly important when diversity increases. Analyses of these mechanisms yield excellent agreement with simulations over nine decades of data. 1
5 0.41751826 29 nips-2004-Beat Tracking the Graphical Model Way
Author: Dustin Lang, Nando D. Freitas
Abstract: We present a graphical model for beat tracking in recorded music. Using a probabilistic graphical model allows us to incorporate local information and global smoothness constraints in a principled manner. We evaluate our model on a set of varied and difficult examples, and achieve impressive results. By using a fast dual-tree algorithm for graphical model inference, our system runs in less time than the duration of the music being processed. 1
6 0.39907274 122 nips-2004-Modelling Uncertainty in the Game of Go
7 0.37771165 48 nips-2004-Convergence and No-Regret in Multiagent Learning
8 0.36489415 74 nips-2004-Harmonising Chorales by Probabilistic Inference
9 0.36031556 57 nips-2004-Economic Properties of Social Networks
10 0.32088655 1 nips-2004-A Cost-Shaping LP for Bellman Error Minimization with Performance Guarantees
11 0.31266904 24 nips-2004-Approximately Efficient Online Mechanism Design
12 0.2899411 207 nips-2004-ℓ₀-norm Minimization for Basis Selection
13 0.28705773 147 nips-2004-Planning for Markov Decision Processes with Sparse Stochasticity
14 0.28221101 199 nips-2004-Using Machine Learning to Break Visual Human Interaction Proofs (HIPs)
15 0.26916572 126 nips-2004-Nearly Tight Bounds for the Continuum-Armed Bandit Problem
16 0.26735747 172 nips-2004-Sparse Coding of Natural Images Using an Overcomplete Set of Limited Capacity Units
17 0.2472319 39 nips-2004-Coarticulation in Markov Decision Processes
18 0.23931222 158 nips-2004-Sampling Methods for Unsupervised Learning
19 0.23375884 64 nips-2004-Experts in a Markov Decision Process
20 0.22753936 106 nips-2004-Machine Learning Applied to Perception: Decision Images for Gender Classification
topicId topicWeight
[(13, 0.051), (15, 0.073), (17, 0.011), (26, 0.043), (31, 0.013), (33, 0.147), (35, 0.012), (50, 0.036), (71, 0.012), (95, 0.473)]
simIndex simValue paperId paperTitle
same-paper 1 0.80613375 171 nips-2004-Solitaire: Man Versus Machine
Author: Xiang Yan, Persi Diaconis, Paat Rusmevichientong, Benjamin V. Roy
Abstract: In this paper, we use the rollout method for policy improvement to analyze a version of Klondike solitaire. This version, sometimes called thoughtful solitaire, has all cards revealed to the player, but then follows the usual Klondike rules. A strategy that we establish, using iterated rollouts, wins about twice as many games on average as an expert human player does. 1
2 0.54227734 58 nips-2004-Edge of Chaos Computation in Mixed-Mode VLSI - A Hard Liquid
Author: Felix Schürmann, Karlheinz Meier, Johannes Schemmel
Abstract: Computation without stable states is a computing paradigm different from Turing’s and has been demonstrated for various types of simulated neural networks. This publication transfers this to a hardware implemented neural network. Results of a software implementation are reproduced showing that the performance peaks when the network exhibits dynamics at the edge of chaos. The liquid computing approach seems well suited for operating analog computing devices such as the used VLSI neural network. 1
3 0.49263293 61 nips-2004-Efficient Out-of-Sample Extension of Dominant-Set Clusters
Author: Massimiliano Pavan, Marcello Pelillo
Abstract: Dominant sets are a new graph-theoretic concept that has proven to be relevant in pairwise data clustering problems, such as image segmentation. They generalize the notion of a maximal clique to edgeweighted graphs and have intriguing, non-trivial connections to continuous quadratic optimization and spectral-based grouping. We address the problem of grouping out-of-sample examples after the clustering process has taken place. This may serve either to drastically reduce the computational burden associated to the processing of very large data sets, or to efficiently deal with dynamic situations whereby data sets need to be updated continually. We show that the very notion of a dominant set offers a simple and efficient way of doing this. Numerical experiments on various grouping problems show the effectiveness of the approach. 1
4 0.35063773 3 nips-2004-A Feature Selection Algorithm Based on the Global Minimization of a Generalization Error Bound
Author: Dori Peleg, Ron Meir
Abstract: A novel linear feature selection algorithm is presented based on the global minimization of a data-dependent generalization error bound. Feature selection and scaling algorithms often lead to non-convex optimization problems, which in many previous approaches were addressed through gradient descent procedures that can only guarantee convergence to a local minimum. We propose an alternative approach, whereby the global solution of the non-convex optimization problem is derived via an equivalent optimization problem. Moreover, the convex optimization task is reduced to a conic quadratic programming problem for which efficient solvers are available. Highly competitive numerical results on both artificial and real-world data sets are reported. 1
5 0.35024029 69 nips-2004-Fast Rates to Bayes for Kernel Machines
Author: Ingo Steinwart, Clint Scovel
Abstract: We establish learning rates to the Bayes risk for support vector machines (SVMs) with hinge loss. In particular, for SVMs with Gaussian RBF kernels we propose a geometric condition for distributions which can be used to determine approximation properties of these kernels. Finally, we compare our methods with a recent paper of G. Blanchard et al.. 1
6 0.34996226 207 nips-2004-ℓ₀-norm Minimization for Basis Selection
7 0.34959194 204 nips-2004-Variational Minimax Estimation of Discrete Distributions under KL Loss
8 0.34956574 161 nips-2004-Self-Tuning Spectral Clustering
9 0.34909213 174 nips-2004-Spike Sorting: Bayesian Clustering of Non-Stationary Data
10 0.34892991 45 nips-2004-Confidence Intervals for the Area Under the ROC Curve
11 0.34882975 86 nips-2004-Instance-Specific Bayesian Model Averaging for Classification
12 0.34816971 189 nips-2004-The Power of Selective Memory: Self-Bounded Learning of Prediction Suffix Trees
13 0.34804517 206 nips-2004-Worst-Case Analysis of Selective Sampling for Linear-Threshold Algorithms
14 0.34797403 77 nips-2004-Hierarchical Clustering of a Mixture Model
15 0.34792334 44 nips-2004-Conditional Random Fields for Object Recognition
16 0.34750938 103 nips-2004-Limits of Spectral Clustering
17 0.34737581 167 nips-2004-Semi-supervised Learning with Penalized Probabilistic Clustering
18 0.34683666 99 nips-2004-Learning Hyper-Features for Visual Identification
19 0.3468169 72 nips-2004-Generalization Error and Algorithmic Convergence of Median Boosting
20 0.34680861 179 nips-2004-Surface Reconstruction using Learned Shape Models