nips nips2003 nips2003-19 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Michael Kearns, Luis E. Ortiz
Abstract: unkown-abstract
Reference: text
sentIndex sentText sentNum sentScore
1 A simple example is the choice of whether to install a fire sprinkler system in an individual condominium in a large building. [sent-3, score-0.075]
2 If “enough” other unit owners have not made the investment in sprinklers, it may be not cost-effective for any individual to do so. [sent-5, score-0.285]
3 All these problems share the following important properties: ¯ ¯ There is a “bad event” (condominium fire, airline explosion, corporate bankruptcy, infection, etc. [sent-7, score-0.076]
4 ) to be avoided, and the opportunity to reduce the risk of it via some kind of investment. [sent-8, score-0.163]
5 The cost-effectiveness of the security investment for the individual is a function of the investment decisions made by the others in the population. [sent-9, score-0.786]
6 2 Definitions In an IDS game, each player must decide whether or not to invest in some abstract security mechanism or procedure that can reduce their risk of experiencing some abstract bad event. [sent-13, score-1.021]
7 The cost of the investment to is , while the cost of experiencing the bad event is Ä ; the interesting case is when Ä . [sent-14, score-0.511]
8 Thus, player has two choices for his action : means the player makes the investment, while means he does not. [sent-15, score-0.9]
9 ½ ¼ For each player , there is a parameter Ô , which is the probability that player will experience the bad event due to internal contamination if — for example, this is the probability of the condominium owner’s unit burning down due to a fire originating in his own unit. [sent-17, score-1.172]
10 We can also think of Ô as a measure of the direct risk to player — as we shall see, it is that portion of his risk under his direct control. [sent-18, score-0.798]
11 ¼ To model sources of indirect risk, for each pair of players , let Õ be the probability that player experiences the bad event as a result of a transfer from player — for example, this is the probability that the condominium of player burns È due to a fire down originating in the unit of player . [sent-19, score-2.584]
12 The first term represents the amount invested in security by player , and is either 0 (if ) or (if ). [sent-22, score-0.63]
13 The second term is the expected cost to due to internal or direct risk of the bad event, and is either Ô Ä (which is the expected cost of internally generated bad events in the case ), or is 0 (in the case of investment, ). [sent-23, score-0.432]
14 Thus, there is a natural tension between the first two terms: players can either invest in security, which costs money but reduces risk, or gamble by not investing. [sent-24, score-0.45]
15 Note that here we have assumed that security investment perfectly eradicates direct risk (but not indirect risk); generalizations are obviously possible, but have no qualitative effect on the model. [sent-25, score-0.639]
16 ¼ ½ ¼ ½ It is the third term of Equation (1) that expresses the interdependent nature of the problem. [sent-26, score-0.091]
17 This term encodes the assumption that there are Ò sources of risk to player — his own internal risk, and a specific transfer risk from each of the other Ò players — and that all these sources are statistically independent. [sent-27, score-1.262]
18 The prefactor Ô is simply the probability that player does not experience the bad event due to direct risk. [sent-28, score-0.641]
19 Thus 1 minus this product is the probability of transferred contamination, and of course the product of the various risk probabilities is also multiplied by the cost Ä of the bad event. [sent-30, score-0.313]
20 ´½ ´½ ´½ ´½ ½ µ µ µ µ The model parameters and Equation (1) define a compact representation for a multiplayer game in which each player’s goal is to minimize their cost. [sent-31, score-0.221]
21 Our interest is in the efficient computation of Nash equilibria (NE) of such games 2 . [sent-32, score-0.12]
22 3 Algorithms We begin with the observation that it is in fact computationally straightforward to find a single pure NE of any IDS game. [sent-34, score-0.074]
23 To see this, it is easily verified that if there are any conditions under which player prefers investing ( ) to not investing ( ) according to the expected costs given by Equation (1), then it is certainly the case that will prefer to invest when all the other Ò players are doing so. [sent-35, score-1.36]
24 Similarly, the most favorable conditions for not investing occur when no other players are investing. [sent-36, score-0.555]
25 Thus, to find a pure NE, we can first check whether either all players investing, or no players investing, forms a NE. [sent-37, score-0.724]
26 If neither of these extremes are a NE, then there are some players for whom investing or not investing is a dominant strategy (a best response independent of the behavior of others). [sent-39, score-0.87]
27 If we then “clamp” such players to their dominant strategies, we obtain a new IDS game with fewer players (only those without dominant strategies in the original game), and can again see if this modified game has any players with dominant strategies. [sent-40, score-1.585]
28 At each stage of this iterative process we maintain the invariant that clamped players are playing a best response to any possible setting of the unclamped players. [sent-41, score-0.355]
29 ½ ¼ ½ ´ µ Theorem 1 A pure NE for any Ò-player IDS game can be computed in time Ç Ò¾ . [sent-42, score-0.295]
30 In a sense, the argument above demonstrates the fact that in most “interesting” IDS games (those in which each player is a true participant, and can have their behavior swayed by that of the overall population), there are two trivial pure NE (all invest and none invest). [sent-43, score-0.741]
31 However, we are also interested in finding NE in which some players are choosing to invest and others not to (even though no player has a dominant strategy). [sent-44, score-0.93]
32 A primary motivation for finding such NE is the appearance of such behavior in “real world” IDS settings, where individual parties do truly seem to make differing security investment choices (such as with sprinkler systems in large apartment buildings). [sent-45, score-0.525]
33 1 Uniform Transfer IDS Games A uniform transfer IDS game is one in which the transfer risks emanating from a given player are independent of the transfer destination. [sent-49, score-1.352]
34 Thus, for any player , we have that for all ,Õ Æ for some value Æ . [sent-50, score-0.45]
35 Note that the risk level Æ presented to the population by different players may still vary with — but each player spreads their risk indiscriminately across the rest of the population. [sent-51, score-1.099]
36 An example would be the assumption that each airline transferred bags with equal probability to all other airlines. [sent-52, score-0.157]
37 In this section, we describe two different approaches for computing NE in uniform transfer IDS games. [sent-53, score-0.266]
38 The first approach views a uniform transfer IDS game as a special type of summarization game, a class recently investigated by Kearns and Mansour [4]. [sent-54, score-0.557]
39 In an Ò-player summarization game, the payoff of each player is a function of the actions of all the other players, but only through the value of a global and common real-valued summarization function Ë . [sent-55, score-0.59]
40 The main result of [4] gives an algorithm for computing approximate NE of summarization games, in which the quality of the approximation depends on the influence of the summarization function Ë . [sent-56, score-0.14]
41 ) ´µ It can be shown (details omitted) that any uniform transfer IDS game is in fact a summa- rization game under the choice Ë´ µ Ò ´½ ´½ µÆ µ (2) ½ and that the influence of this function is bounded by the largest Æ . [sent-59, score-0.708]
42 We note that in many natural uniform transfer IDS settings, we expect this influence to diminish like Ò with the number of players Ò. [sent-60, score-0.591]
43 (This would be the case if the risk transfer comes about through physical objects like airline baggage, where each transfer event can have only a single destination. [sent-61, score-0.668]
44 ½ Theorem 2 There is an algorithm that takes as input any uniform transfer IDS game, and any ¯ , and computes an Ç ¯ -NE, where Ô Æ and Æ . [sent-63, score-0.266]
45 ÑÜ ¼ ´ · µ Ñ Ü ´½ ½ µ ´½ µ We note that in typical IDS settings we expect both the Ô and Æ to be small (the bad event is relatively rare, regardless of its source), in which case may be viewed as a constant. [sent-65, score-0.185]
46 Furthermore, it can be verified that this algorithm will in fact be able to compute approximate NE in which some players choose to invest and others not to, even in the absence of any dominant strategies. [sent-66, score-0.48]
47 While viewing uniform transfer IDS games as bounded influence summarization games relates them to a standard class and yields a natural approximation algorithm, an improved approach is possible. [sent-67, score-0.536]
48 1) that efficiently computes all NE for uniform transfer IDS games. [sent-69, score-0.266]
49 ¼ ¼ We may assume without loss of generality that for all players , Æ , and Ô . [sent-71, score-0.325]
50 The first lemma is a generalization of Proposition 2 of [2], and essentially establishes that the values Ê Ô and Æ Ê Ô determine a two-level ordering of the players’ willingness to invest. [sent-73, score-0.099]
51 This double ordering generates the outer and inner loops of algorithm UniformTransferIDSNash. [sent-74, score-0.066]
52 Note that a player with small Ê Ô has a combination of relatively low cost of investing compared to the loss of a bad event (recall Ê Ä ), and relatively high direct risk Ô , and thus intuitively should be more willing to invest than players with large Ê Ô . [sent-75, score-1.462]
53 ´½ µ Lemma 3 (Ordering Lemma) Let Ü be a NE for a uniform transfer IDS game Ò Ê Ô Æ . [sent-77, score-0.487]
54 The equations for these mixed strategies is exploited in the subroutine TestNash. [sent-79, score-0.114]
55 Algorithm UniformTransferIDSNash Input: An Ò-player uniform transfer IDS game with direct risk parameters Ô, transfer risk parameters Æ , and cost parameters Ê, where Ê Ä. [sent-80, score-1.018]
56 Initialize a partition of the players into three sets Á Æ È (the investing, not investing, and partially investing players, respectively) and test if everybody investing is a NE: Á ½ Ò Æ È Ë TestNash´ Á Æ È Ë µ 2. [sent-83, score-0.805]
57 Let ´ ½ ¾ Ò µ be an ordering of the Ò players satisfying Ê ½ Ô ½ Ê Ò Ô Ò . [sent-84, score-0.37]
58 for Ë (a) Move the next player in the outer ordering from the investing to the partiallyÈ Á Á investing sets: È (b) Let ´ ½ µ be an ordering of the players in È satisfying ´½ Æ ½ µ Ê ½ Ô ½ ´½ Æ µÊ Ô . [sent-87, score-1.346]
59 Set pure strategies for not-investing and investing players, respectively: Æ Ü ¼, Á Ü ½. [sent-91, score-0.361]
60 else (Lemma 4, part (b) applies) Ü ¾ ¼ Í ¾ Ì ´¼ ½µ is indifferent) and Í ¼ , then return (a) Compute mixed strategies È Ü as in Equation 4 (b) if È Ü ¼ or Ü ½, return Ë (c) if Ü is a NE for then return Ë Ü ¾ Ë 4. [sent-96, score-0.146]
61 , player is indifferent) and player mixed strategy satisfies Ü ¾ Í else, (b) if È ½, and Ü is a NE, then for all Ò Lemma 4 (Partial Investment Lemma) Let Ü ¾ be a mixed strategy for a uniform Ò Ê Ô Æ , and let È be the set of partially investing players in Ü. [sent-100, score-1.682]
62 transfer IDS game Then (a) if È , then letting È ,Î ¾Á Ê Ô ¾Æ Æ Ê Ô and Í Ô Ê Î Æ Æ (3) ´ µ Æ ¾ È, where É Ü ¾ È ´Ê Ô µ ´´Ô ºÉ µ ´½ Æ µµ ´½ Æ µ Ê ¾ ½ ´ Æ È Æ (4) ½µ The next theorem summarizes our second algorithmic result on uniform transfer IDS games. [sent-101, score-0.678]
63 Theorem 5 Algorithm UniformTransferIDSNash computes all exact (connected sets of) NE for uniform transfer IDS games in time polynomial in the size of the model. [sent-103, score-0.366]
64 We note that it follows immediately from the description and correctness of the algorithm that any Ò-player uniform transfer IDS game has at most Ò Ò connected sets of NE. [sent-104, score-0.51]
65 In addition, each connected set of NE in a uniform transfer IDS game is either a singleton or a simple interval where Ò of the players play pure strategies and the remaining player has a simple interval in of probability values from which to choose its strategy. [sent-105, score-1.416]
66 At most Ò of the connected sets of NE in a uniform transfer IDS game are simple intervals. [sent-106, score-0.51]
67 We now show that even a slight generalization of uniform transfer IDS games, in which we allow the Æ to assume two fixed values instead of one, leads to the intractabilty of computing at least some of the NE. [sent-109, score-0.266]
68 A graphical uniform transfer IDS game, so named because it can be viewed as a marriage between uniform transfer IDS games and the graphical games introduced in [3], is an IDS game with the restriction that for all players , Õ ¾ Æ , for some Æ . [sent-110, score-1.278]
69 Let Æ Õ be the set of players that can be directly affected by player ’s behavior. [sent-111, score-0.775]
70 ´µ ¼ ¼ ´½ ´½ ´µ ¼ µ µ ´µ The pure Nash extension problem for an Ò-player game with binary actions takes as input a description of the game and a partial assignment ¾ £ Ò. [sent-113, score-0.516]
71 Clearly the problem of computing all the NE is at least as difficult as the pure Nash extension problem. [sent-115, score-0.074]
72 ¼½ ¼½ Theorem 6 The pure Nash extension problem for graphical uniform transfer IDS games is NP-complete, even if Æ for all , and Æ is some fixed value Æ for all . [sent-116, score-0.44]
73 4 Experimental Study: Airline Baggage Security As an empirical demonstration of IDS games, we constructed and conducted experiments on an IDS game for airline security that is based on real industry data. [sent-118, score-0.477]
74 We have access to a data set consisting of 35,362 records of actual civilian commercial flight reservations, both domestic and international, made on August 26, 2002. [sent-119, score-0.035]
75 Since these records contain complete flight itineraries, they include passenger transfers between the 122 represented commercial air carriers. [sent-120, score-0.11]
76 As described below, we used this data set to construct an IDS game in which the players are the 122 carriers, the “bad event” corresponds to a bomb exploding in a bag being transported in a carrier’s airplane, and the transfer event is the physical transfer of a bag from one carrier to another. [sent-121, score-1.179]
77 ´ µ For each carrier pair , the transfer parameter Õ was set to be proportional to the count of transfers from carrier to carrier in the data set. [sent-122, score-0.7]
78 We are thus using the rate of passenger transfers as a proxy for the rate of baggage transfers. [sent-123, score-0.125]
79 The resulting parameters (details omitted) are, as expected, quite asymmetric, as there are highly structured patterns of transfers resulting from differing geographic coverages, alliances between carriers, etc. [sent-124, score-0.09]
80 The model is thus far from being a uniform transfer IDS game, and thus algorithm UniformTransferIDSNash cannot be applied; we instead used a simple gradient learning approach. [sent-125, score-0.266]
81 009 (so an explosion is roughly 110 times more costly to a carrier than full investment in security). [sent-128, score-0.48]
82 Above each plot is an index indicating the rank of the carrier in terms of overall volume in the data set. [sent-131, score-0.153]
83 Each plot shows the investment level Ü (initialized randomly in ¼ ½ ) for carrier over 500 simulation steps. [sent-132, score-0.469]
84 Simulation of the evolution of security investment strategies for the 49 busiest carriers, but with the three largest carriers (indices 1, 2 and 3) in the data set clamped (subsidized) at full investment. [sent-134, score-0.846]
85 Figure 2(a) shows the evolution, over 500 steps of simulation time, of the investment level Ü for the 49 busiest carriers 4 . [sent-136, score-0.568]
86 We have ordered the 49 plots with the least busy carrier 3 This is (hopefully) an unrealistically large value for the real world; however, it is the relationship between the parameters and not their absolute magnitudes that is important in the model. [sent-137, score-0.191]
87 4 According to the total volume of flights per carrier in the data set. [sent-138, score-0.153]
88 (index 49) plotted in the upper left corner, and the busiest (index 1) in the lower right corner. [sent-139, score-0.05]
89 The most striking feature of the figure is the change in the evolution of the investment strategy as we move from less busy to more busy carriers. [sent-142, score-0.414]
90 Broadly speaking, there is a large population of lower-volume carriers (indices 49 down to 34) that quickly converge to full investment (Ü ) regardless of initial conditions. [sent-143, score-0.535]
91 There is then a set of mediumvolume carriers whose limiting strategy is approached more slowly, and may eventually converge to either full or no investment (roughly indices 33 down to 14). [sent-145, score-0.594]
92 Finally, the largest carriers (indices 13 and lower) again converge quickly, but to no investment (Ü ), because they have a high probability of having bags transferred from other carriers (even if they protect themselves against dangerous bags being loaded directly on their planes). [sent-146, score-0.807]
93 The simulation eventually converges (within 2000 steps) to a (Nash) equilibrium in which some carriers are at full investment, and the rest at no investment. [sent-148, score-0.271]
94 This property is extremely robust across initial conditions and model parameters, The above simulation model enables one to examine how subsidizing several airlines to encourage it to invest in security can encourage others to do the same. [sent-149, score-0.401]
95 This type of “tipping” behavior [6] can be the basis for developing strategies for inducing adoption of security measures short of formal regulations or requirements. [sent-150, score-0.254]
96 Figure2(b) shows the result of an identical simulation to the one discussed above, except the three largest carriers (indices 1, 2 and 3) are now “clamped” or forced to be at full investment during the entire simulation. [sent-151, score-0.538]
97 Independent of initial conditions, the remaining population now invariably converges to full investment. [sent-152, score-0.048]
98 Thus the model suggests that these three carriers form (one of perhaps many different) tipping sets — carriers whose decision to invest (due to subsidization or other exogenous forces) will create the economic incentive for a large population of otherwise skeptical carriers to follow. [sent-153, score-0.779]
99 The dynamics also reveal a cascading effect — for example, carrier 5 moves towards full investment (after having settled comfortably at no investment) only after a number of larger and smaller carriers have done so. [sent-154, score-0.675]
100 Efficient Nash computation in summarization games with bounded influence. [sent-172, score-0.17]
wordName wordTfidf (topN-words)
[('player', 0.45), ('ids', 0.428), ('players', 0.325), ('investment', 0.285), ('investing', 0.23), ('game', 0.221), ('carriers', 0.202), ('transfer', 0.191), ('security', 0.18), ('carrier', 0.153), ('risk', 0.148), ('bad', 0.103), ('games', 0.1), ('invest', 0.1), ('uniformtransferidsnash', 0.088), ('nash', 0.08), ('airline', 0.076), ('interdependent', 0.076), ('uniform', 0.075), ('pure', 0.074), ('summarization', 0.07), ('heal', 0.063), ('kunreuther', 0.063), ('event', 0.062), ('strategies', 0.057), ('baggage', 0.05), ('busiest', 0.05), ('condominium', 0.05), ('testnash', 0.05), ('transfers', 0.05), ('ne', 0.046), ('ordering', 0.045), ('transferred', 0.044), ('lemma', 0.038), ('indices', 0.038), ('airlines', 0.038), ('busy', 0.038), ('vaccination', 0.038), ('bags', 0.037), ('originating', 0.037), ('dominant', 0.037), ('mixed', 0.035), ('risks', 0.033), ('screening', 0.033), ('simulation', 0.031), ('strategy', 0.031), ('clamped', 0.03), ('population', 0.028), ('howard', 0.026), ('direct', 0.026), ('omitted', 0.025), ('experiencing', 0.025), ('indifferent', 0.025), ('owner', 0.025), ('passenger', 0.025), ('sprinkler', 0.025), ('sprinklers', 0.025), ('uence', 0.025), ('costs', 0.025), ('kearns', 0.024), ('geoffrey', 0.024), ('tipping', 0.023), ('connected', 0.023), ('chances', 0.022), ('explosion', 0.022), ('geographic', 0.022), ('ight', 0.022), ('incentive', 0.022), ('subroutine', 0.022), ('evolution', 0.022), ('outer', 0.021), ('full', 0.02), ('partially', 0.02), ('settings', 0.02), ('contamination', 0.02), ('equilibria', 0.02), ('lemmas', 0.02), ('participant', 0.02), ('safety', 0.02), ('business', 0.019), ('cost', 0.018), ('others', 0.018), ('return', 0.018), ('bag', 0.018), ('commercial', 0.018), ('differing', 0.018), ('eventually', 0.018), ('decisions', 0.018), ('axes', 0.017), ('behavior', 0.017), ('records', 0.017), ('encourage', 0.017), ('events', 0.016), ('management', 0.016), ('experiences', 0.016), ('checking', 0.016), ('establishes', 0.016), ('expresses', 0.015), ('dynamics', 0.015), ('reduce', 0.015)]
simIndex simValue paperId paperTitle
same-paper 1 0.99999982 19 nips-2003-Algorithms for Interdependent Security Games
Author: Michael Kearns, Luis E. Ortiz
Abstract: unkown-abstract
2 0.22008073 84 nips-2003-How to Combine Expert (and Novice) Advice when Actions Impact the Environment?
Author: Daniela Pucci de Farias, Nimrod Megiddo
Abstract: The so-called “experts algorithms” constitute a methodology for choosing actions repeatedly, when the rewards depend both on the choice of action and on the unknown current state of the environment. An experts algorithm has access to a set of strategies (“experts”), each of which may recommend which action to choose. The algorithm learns how to combine the recommendations of individual experts so that, in the long run, for any fixed sequence of states of the environment, it does as well as the best expert would have done relative to the same sequence. This methodology may not be suitable for situations where the evolution of states of the environment depends on past chosen actions, as is usually the case, for example, in a repeated non-zero-sum game. A new experts algorithm is presented and analyzed in the context of repeated games. It is shown that asymptotically, under certain conditions, it performs as well as the best available expert. This algorithm is quite different from previously proposed experts algorithms. It represents a shift from the paradigms of regret minimization and myopic optimization to consideration of the long-term effect of a player’s actions on the opponent’s actions or the environment. The importance of this shift is demonstrated by the fact that this algorithm is capable of inducing cooperation in the repeated Prisoner’s Dilemma game, whereas previous experts algorithms converge to the suboptimal non-cooperative play. 1
3 0.20747018 65 nips-2003-Extending Q-Learning to General Adaptive Multi-Agent Systems
Author: Gerald Tesauro
Abstract: Recent multi-agent extensions of Q-Learning require knowledge of other agents’ payoffs and Q-functions, and assume game-theoretic play at all times by all other agents. This paper proposes a fundamentally different approach, dubbed “Hyper-Q” Learning, in which values of mixed strategies rather than base actions are learned, and in which other agents’ strategies are estimated from observed actions via Bayesian inference. Hyper-Q may be effective against many different types of adaptive agents, even if they are persistently dynamic. Against certain broad categories of adaptation, it is argued that Hyper-Q may converge to exact optimal time-varying policies. In tests using Rock-Paper-Scissors, Hyper-Q learns to significantly exploit an Infinitesimal Gradient Ascent (IGA) player, as well as a Policy Hill Climber (PHC) player. Preliminary analysis of Hyper-Q against itself is also presented. 1
4 0.10165124 105 nips-2003-Learning Near-Pareto-Optimal Conventions in Polynomial Time
Author: Xiaofeng Wang, Tuomas Sandholm
Abstract: We study how to learn to play a Pareto-optimal strict Nash equilibrium when there exist multiple equilibria and agents may have different preferences among the equilibria. We focus on repeated coordination games of non-identical interest where agents do not know the game structure up front and receive noisy payoffs. We design efficient near-optimal algorithms for both the perfect monitoring and the imperfect monitoring setting(where the agents only observe their own payoffs and the joint actions). 1
5 0.062795781 101 nips-2003-Large Margin Classifiers: Convex Loss, Low Noise, and Convergence Rates
Author: Peter L. Bartlett, Michael I. Jordan, Jon D. Mcauliffe
Abstract: Many classification algorithms, including the support vector machine, boosting and logistic regression, can be viewed as minimum contrast methods that minimize a convex surrogate of the 0-1 loss function. We characterize the statistical consequences of using such a surrogate by providing a general quantitative relationship between the risk as assessed using the 0-1 loss and the risk as assessed using any nonnegative surrogate loss function. We show that this relationship gives nontrivial bounds under the weakest possible condition on the loss function—that it satisfy a pointwise form of Fisher consistency for classification. The relationship is based on a variational transformation of the loss function that is easy to compute in many applications. We also present a refined version of this result in the case of low noise. Finally, we present applications of our results to the estimation of convergence rates in the general setting of function classes that are scaled hulls of a finite-dimensional base class.
6 0.053778522 167 nips-2003-Robustness in Markov Decision Problems with Uncertain Transition Matrices
7 0.046057045 44 nips-2003-Can We Learn to Beat the Best Stock
8 0.039794084 135 nips-2003-Necessary Intransitive Likelihood-Ratio Classifiers
9 0.035454411 23 nips-2003-An Infinity-sample Theory for Multi-category Large Margin Classification
10 0.030017152 170 nips-2003-Self-calibrating Probability Forecasting
11 0.026066292 51 nips-2003-Design of Experiments via Information Theory
12 0.024921766 24 nips-2003-An Iterative Improvement Procedure for Hierarchical Clustering
13 0.024424253 103 nips-2003-Learning Bounds for a Generalized Family of Bayesian Posterior Distributions
14 0.024413165 134 nips-2003-Near-Minimax Optimal Classification with Dyadic Classification Trees
15 0.023179805 146 nips-2003-Online Learning of Non-stationary Sequences
16 0.020674499 55 nips-2003-Distributed Optimization in Adaptive Networks
17 0.020587334 18 nips-2003-A Summating, Exponentially-Decaying CMOS Synapse for Spiking Neural Systems
18 0.019785058 142 nips-2003-On the Concentration of Expectation and Approximate Inference in Layered Networks
19 0.018657673 174 nips-2003-Semidefinite Relaxations for Approximate Inference on Graphs with Cycles
20 0.018319814 176 nips-2003-Sequential Bayesian Kernel Regression
topicId topicWeight
[(0, -0.087), (1, 0.091), (2, -0.046), (3, -0.036), (4, 0.037), (5, 0.032), (6, -0.044), (7, 0.03), (8, -0.243), (9, -0.046), (10, 0.038), (11, 0.131), (12, -0.127), (13, 0.043), (14, -0.032), (15, -0.162), (16, -0.008), (17, 0.101), (18, 0.181), (19, -0.044), (20, -0.016), (21, -0.178), (22, -0.088), (23, -0.131), (24, 0.009), (25, -0.127), (26, -0.118), (27, -0.09), (28, 0.046), (29, -0.087), (30, -0.137), (31, -0.124), (32, 0.022), (33, -0.131), (34, 0.044), (35, 0.034), (36, 0.061), (37, 0.039), (38, -0.045), (39, -0.021), (40, 0.022), (41, -0.038), (42, -0.029), (43, 0.051), (44, 0.055), (45, 0.051), (46, 0.067), (47, -0.057), (48, 0.079), (49, -0.168)]
simIndex simValue paperId paperTitle
same-paper 1 0.97486651 19 nips-2003-Algorithms for Interdependent Security Games
Author: Michael Kearns, Luis E. Ortiz
Abstract: unkown-abstract
2 0.78994024 84 nips-2003-How to Combine Expert (and Novice) Advice when Actions Impact the Environment?
Author: Daniela Pucci de Farias, Nimrod Megiddo
Abstract: The so-called “experts algorithms” constitute a methodology for choosing actions repeatedly, when the rewards depend both on the choice of action and on the unknown current state of the environment. An experts algorithm has access to a set of strategies (“experts”), each of which may recommend which action to choose. The algorithm learns how to combine the recommendations of individual experts so that, in the long run, for any fixed sequence of states of the environment, it does as well as the best expert would have done relative to the same sequence. This methodology may not be suitable for situations where the evolution of states of the environment depends on past chosen actions, as is usually the case, for example, in a repeated non-zero-sum game. A new experts algorithm is presented and analyzed in the context of repeated games. It is shown that asymptotically, under certain conditions, it performs as well as the best available expert. This algorithm is quite different from previously proposed experts algorithms. It represents a shift from the paradigms of regret minimization and myopic optimization to consideration of the long-term effect of a player’s actions on the opponent’s actions or the environment. The importance of this shift is demonstrated by the fact that this algorithm is capable of inducing cooperation in the repeated Prisoner’s Dilemma game, whereas previous experts algorithms converge to the suboptimal non-cooperative play. 1
3 0.65482432 65 nips-2003-Extending Q-Learning to General Adaptive Multi-Agent Systems
Author: Gerald Tesauro
Abstract: Recent multi-agent extensions of Q-Learning require knowledge of other agents’ payoffs and Q-functions, and assume game-theoretic play at all times by all other agents. This paper proposes a fundamentally different approach, dubbed “Hyper-Q” Learning, in which values of mixed strategies rather than base actions are learned, and in which other agents’ strategies are estimated from observed actions via Bayesian inference. Hyper-Q may be effective against many different types of adaptive agents, even if they are persistently dynamic. Against certain broad categories of adaptation, it is argued that Hyper-Q may converge to exact optimal time-varying policies. In tests using Rock-Paper-Scissors, Hyper-Q learns to significantly exploit an Infinitesimal Gradient Ascent (IGA) player, as well as a Policy Hill Climber (PHC) player. Preliminary analysis of Hyper-Q against itself is also presented. 1
4 0.39774722 44 nips-2003-Can We Learn to Beat the Best Stock
Author: Allan Borodin, Ran El-Yaniv, Vincent Gogan
Abstract: A novel algorithm for actively trading stocks is presented. While traditional universal algorithms (and technical trading heuristics) attempt to predict winners or trends, our approach relies on predictable statistical relations between all pairs of stocks in the market. Our empirical results on historical markets provide strong evidence that this type of technical trading can “beat the market” and moreover, can beat the best stock in the market. In doing so we utilize a new idea for smoothing critical parameters in the context of expert learning. 1 Introduction: The Portfolio Selection Problem The portfolio selection (PS) problem is a challenging problem for machine learning, online algorithms and, of course, computational finance. As is well known (e.g. see Lugosi [1]) sequence prediction under the log loss measure can be viewed as a special case of portfolio selection, and perhaps more surprisingly, from a certain worst case minimax criterion, portfolio selection is not essentially any harder (than prediction) as shown in [2] (see also [1], Thm. 20 & 21). But there seems to be a qualitative difference between the practical utility of “universal” sequence prediction and universal portfolio selection. Simply stated, universal sequence prediction algorithms under various probabilistic and worst-case models work very well in practice whereas the known universal portfolio selection algorithms do not seem to provide any substantial benefit over a naive investment strategy (see Sec. 4). A major pragmatic question is whether or not a computer program can consistently outperform the market. A closer inspection of the interesting ideas developed in information theory and online learning suggests that a promising approach is to exploit the natural volatility in the market and in particular to benefit from simple and rather persistent statistical relations between stocks rather than to try to predict stock prices or “winners”. We present a non-universal portfolio selection algorithm1 , which does not try to predict winners. The motivation behind our algorithm is the rationale behind constant rebalancing algorithms and the worst case study of universal trading introduced by Cover [3]. Not only does our proposed algorithm substantially “beat the market” on historical markets, it also beats the best stock. So why are we presenting this algorithm and not just simply making money? There are, of course some caveats and obstacles to utilizing the algorithm. But for large investors the possibility of a goose laying silver (if not golden) eggs is not impossible. 1 Any PS algorithm can be modified to be universal by investing any fixed fraction of the initial wealth in a universal algorithm. Assume a market with m stocks. Let vt = (vt (1), . . . , vt (m)) be the closing prices of the m stocks for the tth day, where vt (j) is the price of the jth stock. It is convenient to work with relative prices xt (j) = vt (j)/vt−1 (j) so that an investment of $d in the jth stock just before the tth period yields dxt (j) dollars. We let xt = (xt (1), . . . , xt (m)) denote the market vector of relative prices corresponding to the tth day. A portfolio b is an allocation of wealth in the stocks, specified by the proportions b = (b(1), . . . , b(m)) of current dollar wealth invested in each of the stocks, where b(j) ≥ 0 and j b(j) = 1. The daily return of a portfolio b w.r.t. a market vector x is b · x = j b(j)x(j) and the (compound) total return, retX (b1 , . . . , bn ), of a sequence of portfolios b1 , . . . , bn w.r.t. a market sequence n X = x1 , . . . , xn is t=1 bt · xt . A portfolio selection algorithm is any deterministic or randomized rule for specifying a sequence of portfolios. The simplest strategy is to “buy-and-hold” stocks using some portfolio b. We denote this strategy by BAHb and let U-BAH denote the uniform buy-and-hold when b = (1/m, . . . , 1/m). We say that a portfolio selection algorithm “beats the market” when it outperforms U-BAH on a given market sequence although in practice “the market” can be represented by some non-uniform BAH (e.g. DJIA). Buy-and-hold strategies rely on the tendency of successful markets to grow. Much of modern portfolio theory focuses on how to choose a good b for the buy-and-hold strategy. The seminal ideas of Markowitz in [4] yield an algorithmic procedure for choosing the weights of the portfolio b so as to minimize the variance for any feasible expected return. This variance minimization is possible by placing appropriate larger weights on subsets of anti-correlated stocks, an idea which we shall also utilize. We denote the optimal in hindsight buy-and-hold strategy (i.e. invest only in the best stock) by BAH∗ . An alternative approach to the static buy-and-hold is to dynamically change the portfolio during the trading period. This approach is often called “active trading”. One example of active trading is constant rebalancing; namely, fix a portfolio b and (re)invest your dollars each day according to b. We denote this constant rebalancing strategy by CBALb and let CBAL∗ denote the optimal (in hindsight) CBAL. A constant rebalancing strategy can often take advantage of market fluctuations to achieve a return significantly greater than that of BAH∗ . CBAL∗ is always at least as good as the best stock BAH∗ and in some real market sequences a constant rebalancing strategy will take advantage of market fluctuations and significantly outperform the best stock (see Table 1). For now, consider Cover and Gluss’ [5] classic (but contrived) example of a market consisting of cash and one stock and 1 1 the market sequence of price relatives 1/2 , 1 , 1/2 , 1 , . . . Now consider the CBALb 2 2 3 1 1 1 11 with b = ( 2 , 2 ). On each odd day the daily return of CBALb is 2 1 + 2 2 = 4 and on each even day, it is 3/2. The total return over n days is therefore (9/8)n/2 , illustrating how a constant rebalancing strategy can yield exponential returns in a “no-growth market”. Under the assumption that the daily market vectors are observations of identically and independently distributed (i.i.d) random variables, it is shown in [6] that CBAL∗ performs at least as good (in the sense of expected total return) as the best online portfolio selection algorithm. However, many studies (see e.g. [7]) argue that stock price sequences do have long term memory and are not i.i.d. A non-traditional objective (in computational finance) is to develop online trading strategies that are in some sense always guaranteed to perform well. Within a line of research pioneered by Cover [5, 3, 2] one attempts to design portfolio selection algorithms that can provably do well (in terms of their total return) with respect to some online or offline benchmark algorithms. Two natural online benchmark algorithms are the uniform buy and hold U-BAH, and the uniform constant rebalancing strategy U-CBAL, which is CBALb with 1 1 b = ( m , . . . , m ). A natural offline benchmark is BAH∗ and a more challenging offline benchmark is CBAL∗ . Cover and Ordentlich’s Universal Portfolios algorithm [3, 2], denoted here by UNIVERSAL, was proven to be universal against CBAL∗ , in the sense that for every market sequence X of m stocks over n days, it guarantees a sub-exponential (indeed polynomial) ratio in n, retX (CBAL∗ )/retX (UNIVERSAL) ≤ O n m−1 2 (1) From a theoretical perspective this is surprising as the ratio is a polynomial in n (for fixed m) whereas CBAL∗ is capable of exponential returns. From a practical perspective, while the m−1 ratio n 2 is not very useful, the motivation that underlies the potential of CBAL algorithms is useful! We follow this motivation and develop a new algorithm which we call ANTICOR. By attempting to systematically follow the constant rebalancing philosophy, ANTICOR is capable of some extraordinary performance in the absence of transaction costs, or even with very small transaction costs. 2 Trying to Learn the Winners The most direct approach to expert learning and portfolio selection is a “(reward based) weighted average prediction” algorithm which adaptively computes a weighted average of experts by gradually increasing (by some multiplicative or additive update rule) the relative weights of the more successful experts. For example, in the context of the PS problem consider the “exponentiated gradient” EG(η) algorithm proposed by Helmbold et al. [8]. The EG(η) algorithm computes the next portfolio to be bt+1 (j) = bt (j) exp {ηxt (j)/(bt · xt )} m j=1 bt (j) exp {ηxt (j)/(bt · xt )} where η is a “learning rate” parameter. EG was designed to greedily choose the best portfolio for yesterday’s market xt while at the same time paying a penalty from moving far from yesterday’s portfolio. For a universal bound on EG, Helmbold et al. set η = 2xmin 2(log m)/n where xmin is a lower bound on any price relative.2 It is easy to see that as n increases, η decreases to 0 so that we can think of η as being very small in order to achieve universality. When η = 0, the algorithm EG(η) degenerates to the uniform CBAL which is not a universal algorithm. It is also the case that if each day the price relatives for all stocks were identical, then EG (as well as other PS algorithms) will converge to the uniform CBAL. Combining a small learning rate with a “reasonably balanced” market we expect the performance of EG to be similar to that of the uniform CBAL and this is confirmed by our experiments (see Table1).3 Cover’s universal algorithms adaptively learn each day’s portfolio by increasing the weights of successful CBALs. The update rule for these universal algorithms is bt+1 = b · rett (CBALb )dµ(b) , rett (CBALb )dµ(b) where µ(·) is some prior distribution over portfolios. Thus, the weight of a possible portfolio is proportional to its total return rett (b) thus far times its prior. The particular universal algorithm we consider in our experiments uses the Dirichlet prior (with parameters 1 1 ( 2 , . . . , 2 )) [2]. Within a constant factor, this algorithm attains the optimal ratio (1) with respect to CBAL∗ .4 The algorithm is equivalent to a particular static distribution over the 2 Helmbold et al. show how to eliminate the need to know xmin and n. While EG can be made universal, its performance ratio is only sub-exponential (and not polynomial) in n. 3 Following Helmbold et al. we fix η = 0.01 in our experiments. 4 Experimentally (on our datasets) there is a negligible difference between the uniform universal algorithm in [3] and the above Dirichlet universal algorithm. class of all CBALs. This equivalence helps to demystify the universality result and also shows that the algorithm can never outperform CBAL∗ . A different type of “winner learning” algorithm can be obtained from any sequence prediction strategy. For each stock, a (soft) sequence prediction algorithm provides a probability p(j) that the next symbol will be j ∈ {1, . . . , m}. We view this as a prediction that stock j will have the best price relative for the next day and set bt+1 (j) = pj . We consider predictions made using the prediction component of the well-known Lempel-Ziv (LZ) lossless compression algorithm [9]. This prediction component is nicely described in Langdon [10] and in Feder [11]. As a prediction algorithm, LZ is provably powerful in various senses. First it can be shown that it is asymptotically optimal with respect to any stationary and ergodic finite order Markov source (Rissanen [12]). Moreover, Feder shows that LZ is also universal in a worst case sense with respect to the (offline) benchmark class of all finite state prediction machines. To summarize, the common approach to devising PS algorithms has been to attempt and learn winners using winner learning schemes. 3 The Anticor Algorithm We propose a different approach, motivated by the CBAL “philosophy”. How can we interpret the success of the uniform CBAL on the Cover and Gluss example of Sec. 1? Clearly, the uniform CBAL here is taking advantage of price fluctuation by constantly transferring wealth from the high performing stock to the anti-correlated low performing stock. Even in a less contrived market, we should be able to take advantage when a stock is currently outperforming other stocks especially if this strong performance is anti-correlated with the performance of these other stocks. Our ANTICORw algorithm considers a short market history (consisting of two consecutive “windows”, each of w trading days) so as to model statistical relations between each pair of stocks. Let LX1 = log(xt−2w+1 ), . . . , log(xt−w )T and LX2 = log(xt−w+1 ), . . . , log(xt )T , where log(xk ) denotes (log(xk (1)), . . . , log(xk (m))). Thus, LX1 and LX2 are the two vector sequences (equivalently, two w × m matrices) constructed by taking the logarithm over the market subsequences corresponding to the time windows [t − 2w + 1, t − w] and [t − w + 1, t], respectively. We denote the jth column of LXk by LXk (j). Let µk = (µk (1), . . . , µk (m)), be the vectors of averages of columns of LXk (that is, µk (j) = E{LXk (j)}). Similarly, let σk , be the vector of standard deviations of columns of LXk . The cross-correlation matrix (and its normalization) between column vectors in LX1 and LX2 are defined as: Mcov (i, j) = (LX1 (i) − µ1 (i))T (LX2 (j) − µ2 (j)); Mcov (i,j) σ1 (i)σ2 (j) σ1 (i), σ2 (j) = 0; 0 otherwise. Mcor (i, j) ∈ [−1, 1] measures the correlation between log-relative prices of stock i over the first window and stock j over the second window. For each pair of stocks i and j we compute claimi→j , the extent to which we want to shift our investment from stock i to stock j. Namely, there is such a claim iff µ2 (i) > µ2 (j) and Mcor (i, j) > 0 in which case claimi→j = Mcor (i, j) + A(i) + A(j) where A(h) = |Mcor (h, h)| if Mcor (h, h) < 0, else 0. Following our interpretation for the success of a CBAL, Mcor (i, j) > 0 is used to predict that stocks i and j will be correlated in consecutive windows (i.e. the current window and the next window based on the evidence for the last two windows) and Mcor (h, h) < 0 predicts that stock h will be anti-correlated with itself over consec˜ utive windows. Finally, bt+1 (i) = bt (i) + j=i [transferj→i − transferi→j ] where ˜ t (i) · claimi→j / ˜ transferi→j = b j claimi→j and bt is the resulting portfolio just after market closing (on day t). Mcor (i, j) SP500: Anticor vs. window size NYSE: Anticor vs. window size w BAH(Anticor ) w Anticor 12 8 w Best Stock Market Return 10 Total Return Total Return (log−scale) 10 Anticorw 5 10 BAH(Anticorw) Anticorw Best Stock Market Best Stock 8 Anticorw 6 4 2 10 2 1 10 Best Stock 1 0 10 2 5 10 15 20 25 5 30 10 15 20 25 30 Window Size (w) Window Size (w) Figure 1: ANTICORw ’s total return (per $1 investment) vs. window size 2 ≤ w ≤ 30 for NYSE (left) and SP500 (right). Our ANTICORw algorithm has one critical parameter, the window size w. In Figure 1 we depict the total return of ANTICORw on two historical datasets as a function of the window size w = 2, . . . , 30. As we might expect, the performance of ANTICORw depends significantly on the window size. However, for all w, ANTICORw beats the uniform market and, moreover, it beats the best stock using most window sizes. Of course, in online trading we cannot choose w in hindsight. Viewing the ANTICORw algorithms as experts, we can try to learn the best expert. But the windows, like individual stocks, induce a rather volatile set of experts and standard expert combination algorithms [13] tend to fail. Alternatively, we can adaptively learn and invest in some weighted average of all ANTICORw algorithms with w less than some maximum W . The simplest case is a uniform investment on all the windows; that is, a uniform buy-and-hold investment on the algorithms ANTICORw , w ∈ [2, W ], denoted by BAHW (ANTICOR). Figure 2 (left) graphs the total return of BAHW (ANTICOR) as a function of W for all values of 2 ≤ W ≤ 50 with respect to the NYSE dataset (see details below). Similar graphs for the other datasets we consider appear qualitatively the same and the choice W = 30 is clearly not optimal. However, for all W ≥ 3, BAHW (ANTICOR) beats the best stock in all our experiments. DJIA: Dec 14, 2002 − Jan 14, 2003 NYSE: Total Return vs. Max Window 10 1.1 BAHW(Anticor) 10 Best Stock MArket 4 10 3 10 Best Stock 2.8 Anticor2 2.2 2.6 1 BAHW(Anticor) 5 Total Return Total Return (log−scale) 10 6 Anticor1 Stocks 7 2 0.9 2.4 1.8 0.8 2.2 1.6 0.7 2 1.4 0.6 2 10 1.8 1.2 0.5 1 10 1.6 1 0.4 0 10 5 10 15 20 25 30 35 Maximal Window size (W) 40 45 50 5 10 15 20 25 Days 5 10 15 20 25 Days 5 10 15 20 25 Days Figure 2: Left: BAHW (ANTICOR)’s total return (per $1 investment) as a function of the maximal window W . Right: Cumulative returns for last month of the DJIA dataset: stocks (left panel); ANTICORw algorithms trading the stocks (denoted ANTICOR1 , middle panel); ANTICORw algorithms trading the ANTICOR algorithms (right panel). Since we now consider the various algorithms as stocks (whose prices are determined by the cumulative returns of the algorithms), we are back to our original portfolio selection problem and if the ANTICOR algorithm performs well on stocks it may also perform well on algorithms. We thus consider active investment in the various ANTICORw algorithms using ANTICOR. We again consider all windows w ≤ W . Of course, we can continue to compound the algorithm any number of times. Here we compound twice and then use a buy-and-hold investment. The resulting algorithm is denoted BAHW (ANTICOR(ANTICOR)). One impact of this compounding, depicted in Figure 2 (right), is to smooth out the anti-correlations exhibited in the stocks. It is evident that after compounding twice the returns become almost completely correlated thus diminishing the possibility that additional compounding will substantially help.5 This idea for eliminating critical parameters may be applicable in other learning applications. The challenge is to understand the conditions and applications in which the process of compounding algorithms will have this smoothing effect! 4 Experimental Results We present an experimental study of the the ANTICOR algorithm and the three online learning algorithms described in Sec. 2. We focus on BAH30 (ANTICOR), abbreviated by ANTI1 and BAH30 (ANTICOR(ANTICOR)), abbreviated by ANTI2 . Four historical datasets are used. The first NYSE dataset, is the one used in [3, 2, 8, 14]. This dataset contains 5651 daily prices for 36 stocks in the New York Stock Exchange (NYSE) for the twenty two year period July 3rd , 1962 to Dec 31st , 1984. The second TSE dataset consists of 88 stocks from the Toronto Stock Exchange (TSE), for the five year period Jan 4th , 1994 to Dec 31st , 1998. The third dataset consists of the 25 stocks from SP500 which (as of Apr. 2003) had the largest market capitalization. This set spans 1276 trading days for the period Jan 2nd , 1998 to Jan 31st , 2003. The fourth dataset consists of the thirty stocks composing the Dow Jones Industrial Average (DJIA) for the two year period (507 days) from Jan 14th , 2001 to Jan 14th , 2003.6 These four datasets are quite different in nature (the market returns for these datasets appear in the first row of Table 1). While every stock in the NYSE increased in value, 32 of the 88 stocks in the TSE lost money, 7 of the 25 stocks in the SP500 lost money and 25 of the 30 stocks in the “negative market” DJIA lost money. All these sets include only highly liquid stocks with huge market capitalizations. In order to maximize the utility of these datasets and yet present rather different markets, we also ran each market in reverse. This is simply done by reversing the order and inverting the relative prices. The reverse datasets are denoted by a ‘-1’ superscript. Some of the reverse markets are particularly challenging. For example, all of the NYSE−1 stocks are going down. Note that the forward and reverse markets (i.e. U-BAH) for the TSE are both increasing but that the TSE−1 is also a challenging market since so many stocks (56 of 88) are declining. Table 1 reports on the total returns of the various algorithms for all eight datasets. We see that prediction algorithms such as LZ can do quite well but the more aggressive ANTI1 and 2 ANTI have excellent and sometimes fantastic returns. Note that these active strategies beat the best stock and even CBAL∗ in all markets with the exception of the TSE−1 in which they still significantly outperform the market. The reader may well be distrustful of what appears to be such unbelievable returns for ANTI1 and ANTI2 especially when applied to the NYSE dataset. However, recall that the NYSE dataset consists of n = 5651 trading days and the y such that y n = the total NYSE return is approximately 1.0029511 for ANTI1 (respectively, 1.0074539 for ANTI2 ); that is, the average daily increase is less than .3% 5 This smoothing effect also allows for the use of simple prediction algorithms such as “expert advice” algorithms [13], which can now better predict a good window size. We have not explored this direction. 6 The four datasets, including their sources and individual stock compositions can be downloaded from http://www.cs.technion.ac.il/∼rani/portfolios. (respectively, .75%). Thus a transaction cost of 1% can present a significant challenge to such active trading strategies (see also Sec. 5). We observe that UNIVERSAL and EG have no substantial advantage over U-CBAL. Some previous expositions of these algorithms highlighted particular combinations of stocks where the returns significantly outperformed UNIVERSAL and the best stock. But the same can be said for U-CBAL. Algorithm M ARKET (U-BAH) B EST S TOCK CBAL∗ U-CBAL ANTI1 ANTI2 LZ EG UNIVERSAL NYSE 14.49 54.14 250.59 27.07 17,059,811.56 238,820,058.10 79.78 27.08 26.99 TSE 1.61 6.27 6.77 1.59 26.77 39.07 1.32 1.59 1.59 SP500 1.34 3.77 4.06 1.64 5.56 5.88 1.67 1.64 1.62 DJIA 0.76 1.18 1.23 0.81 1.59 2.28 0.89 0.81 0.80 NYSE−1 0.11 0.32 2.86 0.22 246.22 1383.78 5.41 0.22 0.22 TSE−1 1.67 37.64 58.61 1.18 7.12 7.27 4.80 1.19 1.19 SP500−1 0.87 1.65 1.91 1.09 6.61 9.69 1.20 1.09 1.07 DJIA−1 1.43 2.77 2.97 1.53 3.67 4.60 1.83 1.53 1.53 Table 1: Monetary returns in dollars (per $1 investment) of various algorithms for four different datasets and their reversed versions. The winner and runner-up for each market appear in boldface. All figures are truncated to two decimals. 5 Concluding Remarks When handling a portfolio of m stocks our algorithm may perform up to m transactions per day. A major concern is therefore the commissions it will incur. Within the proportional commission model (see e.g. [14] and [15], Sec. 14.5.4) there exists a fraction γ ∈ (0, 1) such that an investor pays at a rate of γ/2 for each buy and for each sell. Therefore, the return of a sequence b1 , . . . , bn of portfolios with re˜ spect to a market sequence x1 , . . . , xn is t bt · xt (1 − j γ |bt (j) − bt (j)|) , where 2 1 ˜ (bt (1)xt (1), . . . , bt (m)xt (m)). Our investment algorithm in its simplest form bt = bt ·xt can tolerate very small proportional commission rates and still beat the best stock.7 We note that Blum and Kalai [14] showed that the performance guarantee of UNIVERSAL still holds (and gracefully degrades) in the case of proportional commissions. Many current online brokers only charge a small per share commission rate. A related problem that one must face when actually trading is the difference between bid and ask prices. These bid-ask spreads (and the availability of stocks for both buying and selling) are typically functions of stock liquidity and are typically smaller for large market capitalization stocks. We consider here only very large market cap stocks. As a final caveat, we note that we assume that any one portfolio selection algorithm has no impact on the market! But just like any goose laying golden eggs, widespread use will soon lead to the end of the goose; that is, the market will quickly react. Any report of abnormal returns using historical markets should be suspected of “data snooping”. In particular, when a dataset is excessively mined by testing many strategies there is a substantial chance that one of the strategies will be successful by simple overfitting. Another data snooping hazard is stock selection. For example, the 36 stocks selected for the NYSE dataset were all known to have survived for 22 years. Our ANTICOR algorithms were fully developed using only the NYSE and TSE datasets. The DJIA and SP500 sets were obtained (from public domain sources) after the algorithms were fixed. Finally, our algorithm has one parameter (the maximal window size W ). Our experiments indicate that the algorithm’s performance is robust with respect to W (see Figure 2). 7 For example, with γ = 0.1% we can typically beat the best stock. These results will be presented in the full paper. A number of well-respected works report on statistically robust “abnormal” returns for simple “technical analysis” heuristics, which slightly beat the market. For example, the landmark study of Brock et al. [16] apply 26 simple trading heuristics to the DJIA index from 1897 to 1986 and provide strong support for technical analysis heuristics. While consistently beating the market is considered a great (if not impossible) challenge, our approach to portfolio selection indicates that beating the best stock is an achievable goal. What is missing at this point of time is an analytical model which better explains why our active trading strategies are so successful. In this regard, we are investigating various “statistical adversary” models along the lines suggested by [17, 18]. Namely, we would like to show that an algorithm performs well (relative to some benchmark) for any market sequence that satisfies certain constraints on its empirical statistics. References [1] G. Lugosi. Lectures on prediction URL:http://www.econ.upf.es/∼lugosi/ihp.ps, 2001. of individual sequences. [2] T.M. Cover and E. Ordentlich. Universal portfolios with side information. IEEE Transactions on Information Theory, 42(2):348–363, 1996. [3] T.M. Cover. Universal portfolios. Mathematical Finance, 1:1–29, 1991. [4] H. Markowitz. Portfolio Selection: Efficient Diversification of Investments. John Wiley and Sons, 1959. [5] T.M. Cover and D.H. Gluss. Empirical bayes stock market portfolios. Advances in Applied Mathematics, 7:170–181, 1986. [6] T.M. Cover and J.A. Thomas. Elements of Information Theory. John Wiley & Sons, Inc., 1991. [7] A. Lo and C. MacKinlay. A Non-Random Walk Down Wall Street. Princeton University Press, 1999. [8] D.P. Helmbold, R.E. Schapire, Y. Singer, and M.K. Warmuth. Portfolio selection using multiplicative updates. Mathematical Finance, 8(4):325–347, 1998. [9] J. Ziv and A. Lempel. Compression of individual sequences via variable rate coding. IEEE Transactions on Information Theory, 24:530–536, 1978. [10] G.G. Langdon. A note on the lempel-ziv model for compressing individual sequences. IEEE Transactions on Information Theory, 29:284–287, 1983. [11] M. Feder. Gambling using a finite state machine. IEEE Transactions on Information Theory, 37:1459–1465, 1991. [12] J. Rissanen. A universal data compression system. IEEE Transactions on Information Theory, 29:656–664, 1983. [13] N. Cesa-Bianchi, Y. Freund, D. Haussler, D.P. Helmbold, R.E. Schapire, and M.K. Warmuth. How to use expert advice. Journal of the ACM, 44(3):427–485, May 1997. [14] A. Blum and A. Kalai. Universal portfolios with and without transaction costs. Machine Learning, 30(1):23–30, 1998. [15] A. Borodin and R. El-Yaniv. Online Computation and Competitive Analysis. Cambridge University Press, 1998. [16] L. Brock, J. Lakonishok, and B. LeBaron. Simple technical trading rules and the stochastic properties of stock returns. Journal of Finance, 47:1731–1764, 1992. [17] P. Raghavan. A statistical adversary for on-line algorithms. DIMACS Series in Discrete Mathematics and Theoretical Computer Science, 7:79–83, 1992. [18] A. Chou, J.R. Cooperstock, R. El-Yaniv, M. Klugerman, and T. Leighton. The statistical adversary allows optimal money-making trading strategies. In Proceedings of the 6th Annual ACM-SIAM Symposium on Discrete Algorithms, 1995.
5 0.39690399 146 nips-2003-Online Learning of Non-stationary Sequences
Author: Claire Monteleoni, Tommi S. Jaakkola
Abstract: We consider an online learning scenario in which the learner can make predictions on the basis of a fixed set of experts. We derive upper and lower relative loss bounds for a class of universal learning algorithms involving a switching dynamics over the choice of the experts. On the basis of the performance bounds we provide the optimal a priori discretization for learning the parameter that governs the switching dynamics. We demonstrate the new algorithm in the context of wireless networks.
6 0.38783753 105 nips-2003-Learning Near-Pareto-Optimal Conventions in Polynomial Time
7 0.24079913 135 nips-2003-Necessary Intransitive Likelihood-Ratio Classifiers
8 0.19581424 51 nips-2003-Design of Experiments via Information Theory
9 0.1848968 167 nips-2003-Robustness in Markov Decision Problems with Uncertain Transition Matrices
10 0.1712992 101 nips-2003-Large Margin Classifiers: Convex Loss, Low Noise, and Convergence Rates
11 0.16285494 97 nips-2003-Iterative Scaled Trust-Region Learning in Krylov Subspaces via Pearlmutter's Implicit Sparse Hessian
12 0.15620899 125 nips-2003-Maximum Likelihood Estimation of a Stochastic Integrate-and-Fire Neural Model
13 0.14520836 172 nips-2003-Semi-Supervised Learning with Trees
14 0.13948536 103 nips-2003-Learning Bounds for a Generalized Family of Bayesian Posterior Distributions
15 0.13803224 75 nips-2003-From Algorithmic to Subjective Randomness
16 0.13533296 39 nips-2003-Bayesian Color Constancy with Non-Gaussian Models
17 0.13256542 14 nips-2003-A Nonlinear Predictive State Representation
18 0.13129658 23 nips-2003-An Infinity-sample Theory for Multi-category Large Margin Classification
19 0.13058713 70 nips-2003-Fast Algorithms for Large-State-Space HMMs with Applications to Web Usage Analysis
20 0.12912686 184 nips-2003-The Diffusion-Limited Biochemical Signal-Relay Channel
topicId topicWeight
[(0, 0.035), (11, 0.011), (29, 0.023), (30, 0.016), (35, 0.048), (53, 0.088), (71, 0.042), (76, 0.028), (85, 0.04), (91, 0.139), (94, 0.379), (99, 0.012)]
simIndex simValue paperId paperTitle
same-paper 1 0.82897925 19 nips-2003-Algorithms for Interdependent Security Games
Author: Michael Kearns, Luis E. Ortiz
Abstract: unkown-abstract
2 0.72311342 83 nips-2003-Hierarchical Topic Models and the Nested Chinese Restaurant Process
Author: Thomas L. Griffiths, Michael I. Jordan, Joshua B. Tenenbaum, David M. Blei
Abstract: We address the problem of learning topic hierarchies from data. The model selection problem in this domain is daunting—which of the large collection of possible trees to use? We take a Bayesian approach, generating an appropriate prior via a distribution on partitions that we refer to as the nested Chinese restaurant process. This nonparametric prior allows arbitrarily large branching factors and readily accommodates growing data collections. We build a hierarchical topic model by combining this prior with a likelihood that is based on a hierarchical variant of latent Dirichlet allocation. We illustrate our approach on simulated data and with an application to the modeling of NIPS abstracts. 1
3 0.62731493 104 nips-2003-Learning Curves for Stochastic Gradient Descent in Linear Feedforward Networks
Author: Justin Werfel, Xiaohui Xie, H. S. Seung
Abstract: Gradient-following learning methods can encounter problems of implementation in many applications, and stochastic variants are frequently used to overcome these difficulties. We derive quantitative learning curves for three online training methods used with a linear perceptron: direct gradient descent, node perturbation, and weight perturbation. The maximum learning rate for the stochastic methods scales inversely with the first power of the dimensionality of the noise injected into the system; with sufficiently small learning rate, all three methods give identical learning curves. These results suggest guidelines for when these stochastic methods will be limited in their utility, and considerations for architectures in which they will be effective. 1
4 0.42897961 65 nips-2003-Extending Q-Learning to General Adaptive Multi-Agent Systems
Author: Gerald Tesauro
Abstract: Recent multi-agent extensions of Q-Learning require knowledge of other agents’ payoffs and Q-functions, and assume game-theoretic play at all times by all other agents. This paper proposes a fundamentally different approach, dubbed “Hyper-Q” Learning, in which values of mixed strategies rather than base actions are learned, and in which other agents’ strategies are estimated from observed actions via Bayesian inference. Hyper-Q may be effective against many different types of adaptive agents, even if they are persistently dynamic. Against certain broad categories of adaptation, it is argued that Hyper-Q may converge to exact optimal time-varying policies. In tests using Rock-Paper-Scissors, Hyper-Q learns to significantly exploit an Infinitesimal Gradient Ascent (IGA) player, as well as a Policy Hill Climber (PHC) player. Preliminary analysis of Hyper-Q against itself is also presented. 1
5 0.42699072 46 nips-2003-Clustering with the Connectivity Kernel
Author: Bernd Fischer, Volker Roth, Joachim M. Buhmann
Abstract: Clustering aims at extracting hidden structure in dataset. While the problem of finding compact clusters has been widely studied in the literature, extracting arbitrarily formed elongated structures is considered a much harder problem. In this paper we present a novel clustering algorithm which tackles the problem by a two step procedure: first the data are transformed in such a way that elongated structures become compact ones. In a second step, these new objects are clustered by optimizing a compactness-based criterion. The advantages of the method over related approaches are threefold: (i) robustness properties of compactness-based criteria naturally transfer to the problem of extracting elongated structures, leading to a model which is highly robust against outlier objects; (ii) the transformed distances induce a Mercer kernel which allows us to formulate a polynomial approximation scheme to the generally N Phard clustering problem; (iii) the new method does not contain free kernel parameters in contrast to methods like spectral clustering or mean-shift clustering. 1
6 0.42556852 4 nips-2003-A Biologically Plausible Algorithm for Reinforcement-shaped Representational Learning
7 0.42355692 73 nips-2003-Feature Selection in Clustering Problems
8 0.42287034 107 nips-2003-Learning Spectral Clustering
9 0.42016813 68 nips-2003-Eye Movements for Reward Maximization
10 0.41889414 81 nips-2003-Geometric Analysis of Constrained Curves
11 0.41885543 79 nips-2003-Gene Expression Clustering with Functional Mixture Models
12 0.41852513 20 nips-2003-All learning is Local: Multi-agent Learning in Global Reward Games
13 0.41458589 78 nips-2003-Gaussian Processes in Reinforcement Learning
14 0.41417795 55 nips-2003-Distributed Optimization in Adaptive Networks
15 0.4129425 30 nips-2003-Approximability of Probability Distributions
16 0.41285014 186 nips-2003-Towards Social Robots: Automatic Evaluation of Human-Robot Interaction by Facial Expression Classification
17 0.41154388 110 nips-2003-Learning a World Model and Planning with a Self-Organizing, Dynamic Neural System
18 0.41139966 87 nips-2003-Identifying Structure across Pre-partitioned Data
19 0.41089988 125 nips-2003-Maximum Likelihood Estimation of a Stochastic Integrate-and-Fire Neural Model
20 0.40985706 86 nips-2003-ICA-based Clustering of Genes from Microarray Expression Data