nips nips2012 nips2012-45 knowledge-graph by maker-knowledge-mining

45 nips-2012-Approximating Equilibria in Sequential Auctions with Incomplete Information and Multi-Unit Demand


Source: pdf

Author: Amy Greenwald, Jiacui Li, Eric Sodomka

Abstract: In many large economic markets, goods are sold through sequential auctions. Examples include eBay, online ad auctions, wireless spectrum auctions, and the Dutch flower auctions. In this paper, we combine methods from game theory and decision theory to search for approximate equilibria in sequential auction domains, in which bidders do not know their opponents’ values for goods, bidders only partially observe the actions of their opponents’, and bidders demand multiple goods. We restrict attention to two-phased strategies: first predict (i.e., learn); second, optimize. We use best-reply dynamics [4] for prediction (i.e., to predict other bidders’ strategies), and then assuming fixed other-bidder strategies, we estimate and solve the ensuing Markov decision processes (MDP) [18] for optimization. We exploit auction properties to represent the MDP in a more compact state space, and we use Monte Carlo simulation to make estimating the MDP tractable. We show how equilibria found using our search procedure compare to known equilibria for simpler auction domains, and we approximate an equilibrium for a more complex auction domain where analytical solutions are unknown. 1

Reference: text


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 edu Abstract In many large economic markets, goods are sold through sequential auctions. [sent-5, score-0.282]

2 We exploit auction properties to represent the MDP in a more compact state space, and we use Monte Carlo simulation to make estimating the MDP tractable. [sent-14, score-0.414]

3 We show how equilibria found using our search procedure compare to known equilibria for simpler auction domains, and we approximate an equilibrium for a more complex auction domain where analytical solutions are unknown. [sent-15, score-1.403]

4 The FCC has conducted auctions for wireless spectrum since 1994, reaching sales of over $60 billion. [sent-19, score-0.311]

5 1 Perishable commodities such as flowers are often sold via auction; the Dutch flower auctions had about $5. [sent-20, score-0.338]

6 2 A game-theoretic equilibrium, in which each bidder best responds to the strategies of its opponents, can be used as a means of prescribing and predicting auction outcomes. [sent-22, score-0.988]

7 Finding equilibria in auctions is potentially valuable to bidders, as they can use the resulting strategies as prescriptions that guide their decisions, and to auction designers, as they can use the resulting strategies as predictions for bidder behavior. [sent-23, score-1.565]

8 While a rich literature exists on computing equilibria for relatively simple auction games [11], auction theory offers few analytical solutions for real-world auctions. [sent-24, score-1.112]

9 Even existing computational methods for approximating equilibria quickly become intractable as the number of bidders and goods, and the complexity of preferences and decisions, increase. [sent-25, score-0.529]

10 1 In this paper, we combine methods from game theory and decision theory to approximate equilibria in sequential auction domains, in which bidders do not know their opponents’ values for goods, bidders partially observe the actions of their opponents’, and bidders demand multiple goods. [sent-34, score-1.759]

11 Our method of searching for equilibria is motivated by the desire to reach strategies that real-world bidders might actually use. [sent-35, score-0.587]

12 We exploit auction properties to represent the MDPs in a more compact state space, and we use Monte Carlo simulation to make estimating the MDPs tractable. [sent-42, score-0.414]

13 The number of bidders n and the order in which goods are sold are assumed to be common knowledge. [sent-44, score-0.444]

14 During auction round k, each bidder i submits a private bid bk ∈ Bi to the auctioneer. [sent-45, score-1.355]

15 , bk denote the vector of bids submitted by all bidders at round k. [sent-49, score-0.609]

16 The bidder who n 1 submits the highest bid wins and is assigned a cost based on a commonly known payment rule. [sent-50, score-0.777]

17 Bidders only observe opponents’ bids if those bids are announced by the auctioneer. [sent-52, score-0.314]

18 Regardless, we assume that bidder i is told at least which set of goods she won in the k kth round, wi ∈ {∅, {k}}, and how much she paid, ck ∈ R. [sent-53, score-0.667]

19 We let ψ(ok | bk ) ∈ [0, 1] denote i the probability that the auctioneer sends the bidders signals ok = ok , . [sent-54, score-0.525]

20 , ok given bk , and we let n 1 ψ(ok | bk ) express the probability that player i receives signal ok , given bk . [sent-57, score-0.249]

21 i i An auction history at round k consists of past bids plus all information communicated by the auctioneer though round k − 1. [sent-58, score-0.836]

22 , (bk−1 ok−1 ) be a possible auction history at i i i i i round k as observed by bidder i. [sent-62, score-1.053]

23 Let Hi be the set of all possible auction histories for bidder i. [sent-63, score-0.93]

24 Each bidder i is endowed with a privately known type θi ∈ Θi , drawn from a commonly known distribution F , that determines bidder i’s valuations for various bundles of goods. [sent-64, score-1.112]

25 A (behavioral) strategy σi : Θ × Hi → Bi for bidder i specifies a distribution over bids for each possible type and auction history. [sent-65, score-1.136]

26 At the end of the K auction rounds, bidder i’s utility is based on the bundle of goods she won and the amount she paid for those goods. [sent-67, score-1.136]

27 A bidder’s utility for type θi and history hK after K auction rounds is simply that bidder’s value for the bundle of goods it won minus its cost: K k ui (θi , hK ) = v(∪K wi ; θi ) − k=1 ck . [sent-73, score-0.73]

28 i i k=1 Given a sequential auction Γ (defined by all of the above), bidder i’s objective is to choose a strategy that maximizes its expected utility. [sent-74, score-1.07]

29 (Throughout the paper, subscript i refers to a bidder i while −i refers to all bidders except i. [sent-77, score-0.836]

30 ) Let Ui (σ) = Eθi ,hK |σ [ui (θi , hK )] denote bidder i’s expected utility given strategy profile σ. [sent-78, score-0.599]

31 Given a sequential auction Γ, a strategy profile σ ∈ Σ is an -Bayes-Nash-equilibrium if Ui (σ) + ≥ Ui (σi , σ−i ) ∀i ∈ {1, . [sent-80, score-0.539]

32 In an -Bayes-Nash Equilibrium, each bidder has to come within an additive factor ( ) of bestresponding to its opponent strategies. [sent-84, score-0.588]

33 A Bayes-Nash equilibrium is an -Bayes-Nash equilibrium where = 0. [sent-85, score-0.226]

34 Given a sequential auction Γ, the -factor of strategy profile σ for bidder i is i (σ) = maxσi Ui (σi , σ−i ) − Ui (σi , σ−i ). [sent-88, score-1.07]

35 In words, the -factor measures bidder i’s loss in expected utility for not playing his part of σ when other bidders are playing their parts. [sent-89, score-0.943]

36 2 3 Theoretical Results As the number of rounds, bidders, possible types, or possible actions in a sequential auction increases, it quickly becomes intractable to find equilibria using existing computational methods. [sent-90, score-0.778]

37 Such real-world intractability is one reason bidders often do not attempt to solve for equilibria, but rather optimize with respect to predictions about opponent behavior. [sent-91, score-0.362]

38 Building on past work [2, 8], our first contribution is to fully represent the decision problem for a single bidder i in a sequential auction Γ as a Markov decision process (MDP). [sent-92, score-1.085]

39 A full-history MDP Mi (Γ, θi , T ) represents the sequential auction Γ from bidder i’s perspective, assuming i’s type is θi , with states S = Hi , actions A = Bi , rewards R(s) = {ui (θi , hK ) if s = hK is a history of length K; 0 otherwise}, and transition function T . [sent-94, score-1.17]

40 i i If bidder types are correlated, bidder i’s type informs its beliefs about opponents’ types and thus opponents’ predicted behavior. [sent-95, score-1.166]

41 For notational and computational simplicity, we assume that bidder types are drawn independently, in which case there is one transition function T regardless of bidder i’s type. [sent-96, score-1.143]

42 We also assume that bidders are symmetric, meaning their types are all drawn from the same distribution. [sent-97, score-0.323]

43 When bidders are symmetric, we can restrict our attention to symmetric equilibria, where a single set of full-history MDPs, one per type, is solved on behalf of all bidders. [sent-98, score-0.331]

44 An MDP assessment (π, T ) for a sequential auction Γ is a set of policies {π θi | θi ∈ Θi }, one for each full-history MDP Mi (Γ, θi , T ). [sent-100, score-0.649]

45 Consider a first-price sequential auction with two rounds, two bidders, two possible types (“H” and “L”) drawn independently from a uniform prior (i. [sent-105, score-0.532]

46 Suppose Bidder 2 is playing the following simple strategy: if type H: bid “high” with probability . [sent-110, score-0.282]

47 Bidder 1’s beliefs about Bidder 2’s type is based solely on the type prior, resulting in beliefs that Bidder 2 will bid “high” and “low” each with equal probability. [sent-116, score-0.362]

48 Suppose Bidder 1 bids “low” and loses to Bidder 2, who the auctioneer reports as having bid “high”. [sent-117, score-0.432]

49 Given this bid distribution for Bidder 2, Bidder 1 can compute her probability of transitioning to various future states for each possible bid. [sent-132, score-0.226]

50 More formally, denoting sk and ak as agent i’s state and action at auction round k, respectively, i i define Pr(sk+1 | sk , ak ) to be the probability of reaching state sk+1 given that action ak was taken in i i i i i state sk . [sent-133, score-1.263]

51 Each agent’s action at round k is conditionally independent given that agent’s state at round k: Pr(ak | sk , θ−i ) = j=i Pr(ak | sk , θj ) = −i −i j j π θj (ak | sk ). [sent-136, score-0.555]

52 An MDP assessment (π, T ) for a sequential auction Γ is called δ-stable if d(T, Induced(π)) < δ, for some symmetric distance function d. [sent-141, score-0.643]

53 When δ = 0, the induced transition probabilities exactly equal the transition probabilities from the MDP assessment (π, T ), meaning that if all agents follow (π, T ), the transition function T is correct. [sent-142, score-0.324]

54 An MDP assessment (π, T ) for a sequential auction Γ is called α-optimal if for all policies π , U (π, T ) + α ≥ U (π , T ). [sent-146, score-0.649]

55 , stable) MDP assessment for the sequential auction Γ, each agent is best responding to its beliefs, and each agent’s beliefs are correct. [sent-151, score-0.718]

56 It follows that any optimal stable MDP assessment for the sequential auction Γ corresponds to a symmetric Bayes-Nash equilibrium for Γ. [sent-152, score-0.756]

57 Given such a black box, if (π, T ) is an α-optimal δ-stable MDP assessment for the sequential auction Γ, then π is a symmetric -Bayes-Nash equilibrium for Γ, where = 2D(δ) + α. [sent-161, score-0.772]

58 If (π, T ) is an α-optimal δ-stable MDP assessment for the sequential auction Γ, then π is a symmetric -Bayes-Nash equilibrium for Γ, where = 2δK + α. [sent-167, score-0.756]

59 In particlar, when the distance between other-agent bid predictions and the actual other-agent bids induced by the actual other-agent policies is less than δ, optimizing agents play a 2δK-BNE. [sent-168, score-0.498]

60 4 In particular, if MDP assessment (π, T ) is δ-stable, then |U (π, T ) − k k k+1 U (π, Induced(π))| ≤ δK, where d(T, T ) = ) − T (sk , ak , sk+1 )| and i i i sk+1 |T (si , ai , si i K is the MDP’s horizon. [sent-171, score-0.277]

61 [24] show that, for simultaneous one-shot auctions, optimizing with respect to predictions about other-agent bids is an -Bayes-Nash equilibrium, where depends on the distance between other-agent bid predictions and the actual other-agent bids induced by the actual otheragent strategies. [sent-173, score-0.606]

62 4 Searching for an -BNE We now know that an optimal, stable MDP assessment is a BNE, and moreover, a near-optimal, near-stable MDP assessment is nearly a BNE. [sent-175, score-0.206]

63 3 Note that this result also generalizes to non-symmetric equilibria: we would calculate a vector of induced transition probabilities (one per bidder), given a vector of MDP assessments, (one per bidder), instead of assuming that each bidder abides by the same assessment. [sent-177, score-0.625]

64 We present our theoretical results in terms of symmetric equilibria for notational simplicity, and because we search for symmetric equilibria in Section 5. [sent-179, score-0.5]

65 Return MDP assessment (π τ , T τ ) and This learning process is not guaranteed to converge, so upon termination, it could return an optimal, δ-stable MDP assessment for some very large δ. [sent-192, score-0.206]

66 However, it has been shown to be successful experimentally in simultaneous auction games [24] and other large games of imperfect information [7]. [sent-193, score-0.564]

67 Second, we exploit some special structure of sequential auctions: if nothing but the winning price at each round is revealed, conditional on reaching state sk , i the posterior distribution over highest opponent bids is sufficient for computing the probability of that round’s outcome (Equation 6). [sent-198, score-0.601]

68 Under this assumption, when the auctioneer announces that an Bidder j has won an auction for the first time, this provides the same information as if a different Bidder k won an auction for the first time. [sent-203, score-0.927]

69 This can greatly decrease the MDP state space, particularly if the number of players n is larger than the number of auctions K, as is often the case in competitive markets. [sent-205, score-0.31]

70 6 Second, we exploit the property of losing bid symmetry: if a bidder i loses with a bid of b or a bid of b , its beliefs about its opponents bids are unchanged, and thus it receives the same reward for placing the same bid at either resulting state. [sent-207, score-1.78]

71 5 A distribution over the next round’s highest opponent bid is only sufficient without the possibility of ties. [sent-208, score-0.283]

72 In ties can occur, a distribution over the number of opponents placing that highest bid is also needed. [sent-209, score-0.37]

73 5 -factor Approximation Define Ui (π) = Eθi ,hK |π [ui (θi , hK )] to be bidder i’s expected utility i i i when each agent plays its part in the vector of MDP assessment policies π. [sent-216, score-0.766]

74 Following Definition 2, the -factor measures bidder i’s loss in expected utility for not playing his part of π when other bidders are playing their parts: i (π) = maxπi Ui (πi , π−i ) − Ui (πi , π−i ). [sent-217, score-0.943]

75 , π θ ) L times, and then averaging ˆ π∗ ˆ bidder i’s resulting utilities. [sent-228, score-0.531]

76 5 Experimental Results This section presents the results of running our iterative learning method on three auction models studied in the economics literature: Katzman [10], Weber [23], and Menezes and Monteiro [14]. [sent-235, score-0.415]

77 Although these particular sequential auctions are all second price, our method applies to sequential auctions with other rules as well. [sent-239, score-0.837]

78 It is a dominant strategy to bid truthfully in a one-shot second-price auction [22]; hence, when comparing policies in two-round second-price auctions it suffices to compare first-round policies only. [sent-241, score-1.009]

79 Static Experiments We first run one iteration of our learning procedure to check whether the derived equilibria are strict. [sent-242, score-0.241]

80 Our results indicate that the equilibria derived by Weber and Katzman are indeed strict, while that by Menezes and Monteiro (MM) is not, since there exists a set of best-responses to the equilibrium strategy, not a unique best-response. [sent-245, score-0.337]

81 We confirm analytically that the set of bids output by our learning procedure are best-responses to the theoretical equilibrium, with the upper bound being the known theoretical equilibrium strategy and the lower bound being the black dotted line. [sent-246, score-0.328]

82 We chose initial strategies π 0 parametrized by p ∈ R+ that bid xp when the marginal value of winning an additional good is x. [sent-250, score-0.304]

83 5 Valuation (b) 1 (c) Figure 1: Comparison of first-round bidding functions of theoretical equilibrium strategies (green) and that of the best response from one step of the iterative learning procedure initialized with those equilibrium strategies (blue). [sent-269, score-0.453]

84 All existing theoretical work on Bayesian sequential auctions with multi-unit demand is confined to two-round cases due to the increased complexity of additional rounds, but our method removes this constraint. [sent-309, score-0.456]

85 We extend the two-round MM model into a three-round auction model,9 and apply our learning procedure. [sent-310, score-0.399]

86 6 Related Work On the theoretical side, Weber [23] derived equilibrium strategies for a basic model in which n bidders compete in k first or second price auctions, but bidders are assumed to have unit demand. [sent-315, score-0.802]

87 F´ vrier [6] and Yao [25] studied a model where n bidders have multi-unit demand, but there are e only two auctions and a bidder’s per-good valuation is the same across the two goods. [sent-316, score-0.678]

88 [17] studied models of n bidders with multi-unit demand where bidders have 9 This model is described in supplemental material. [sent-318, score-0.656]

89 [19] generalized fictitious play to finite-action incomplete information games and applied their technique to simultaneous second-price auctions with utilities expressible as linear functions over a one-dimensional type space. [sent-351, score-0.436]

90 [5] find equilibrium bidding strategies in sequential auctions with incomplete information under various rules of information revelation after each round. [sent-354, score-0.712]

91 From a decision-theoretic perspective, the bidding problem for sequential auctions was previously formulated as an MDP in related domains. [sent-356, score-0.488]

92 [2], an MDP is created where distinct goods are for sold consecutively, complementarities exist across goods, and the bidder is budgetconstrained. [sent-358, score-0.67]

93 7 Conclusion We presented a two step procedure (predict and optimize) for finding approximate equilibria in a class of complex sequential auctions in which bidders have incomplete information about opponents’ types, imperfect information about opponents’ bids, and demand multiple goods. [sent-364, score-1.059]

94 We evaluated our method on models with analytically derived equilibria and on an auction domain in which analytical solutions were heretofore unknown. [sent-366, score-0.65]

95 For a more complex auction with no known analytical solutions, our method converged to an approximate equilibria with an -factor less than 10−4 , and did so robustly with respect to initialization of the learning procedure. [sent-368, score-0.65]

96 The fact that our procedure converged to nearly identical approximate equilibria even from different initializations is promising, and further exploring convergence properties in this domain is a direction for future work. [sent-370, score-0.257]

97 Sequential auctions for the allocation of resources with complementarities. [sent-385, score-0.295]

98 Monte Carlo approximation in incomplete information, sequential auction games. [sent-392, score-0.55]

99 Equilibrium of a sequence of auctions when bidders demand multiple items. [sent-451, score-0.646]

100 Computing pure Bayesian Nash equilibria in games with finite actions and continuous types. [sent-484, score-0.327]


similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('bidder', 0.531), ('auction', 0.399), ('bidders', 0.305), ('auctions', 0.295), ('bid', 0.226), ('equilibria', 0.224), ('mdp', 0.18), ('bids', 0.157), ('opponents', 0.144), ('sequential', 0.115), ('equilibrium', 0.113), ('ak', 0.11), ('round', 0.108), ('sk', 0.108), ('assessment', 0.103), ('goods', 0.096), ('bidding', 0.078), ('valuation', 0.078), ('ok', 0.066), ('games', 0.063), ('strategies', 0.058), ('pr', 0.058), ('agent', 0.057), ('opponent', 0.057), ('ui', 0.055), ('auctioneer', 0.049), ('inducedn', 0.049), ('induced', 0.048), ('transition', 0.046), ('demand', 0.046), ('beliefs', 0.044), ('sold', 0.043), ('weber', 0.043), ('utility', 0.043), ('actions', 0.04), ('won', 0.04), ('menezes', 0.039), ('si', 0.039), ('bk', 0.039), ('incomplete', 0.036), ('hk', 0.036), ('agents', 0.035), ('policies', 0.032), ('mm', 0.032), ('playing', 0.032), ('private', 0.032), ('rounds', 0.031), ('pro', 0.03), ('greenwald', 0.03), ('katzman', 0.03), ('syrgkanis', 0.03), ('economic', 0.028), ('bundle', 0.027), ('analytical', 0.027), ('valuations', 0.026), ('wellman', 0.026), ('symmetric', 0.026), ('strategy', 0.025), ('ai', 0.025), ('type', 0.024), ('ul', 0.023), ('price', 0.021), ('imperfect', 0.021), ('decision', 0.02), ('winning', 0.02), ('bne', 0.02), ('dutch', 0.02), ('fatima', 0.02), ('ganzfried', 0.02), ('jiacui', 0.02), ('leme', 0.02), ('monteiro', 0.02), ('mostafa', 0.02), ('nielsen', 0.02), ('ower', 0.02), ('paes', 0.02), ('sodomka', 0.02), ('submits', 0.02), ('mdps', 0.019), ('policy', 0.019), ('types', 0.018), ('winner', 0.018), ('simultaneous', 0.018), ('imprecision', 0.017), ('procedure', 0.017), ('carlo', 0.017), ('regardless', 0.017), ('rules', 0.017), ('box', 0.016), ('monte', 0.016), ('le', 0.016), ('independencies', 0.016), ('sales', 0.016), ('initializations', 0.016), ('iterative', 0.016), ('black', 0.016), ('history', 0.015), ('symmetry', 0.015), ('state', 0.015), ('epsilon', 0.015)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.99999976 45 nips-2012-Approximating Equilibria in Sequential Auctions with Incomplete Information and Multi-Unit Demand

Author: Amy Greenwald, Jiacui Li, Eric Sodomka

Abstract: In many large economic markets, goods are sold through sequential auctions. Examples include eBay, online ad auctions, wireless spectrum auctions, and the Dutch flower auctions. In this paper, we combine methods from game theory and decision theory to search for approximate equilibria in sequential auction domains, in which bidders do not know their opponents’ values for goods, bidders only partially observe the actions of their opponents’, and bidders demand multiple goods. We restrict attention to two-phased strategies: first predict (i.e., learn); second, optimize. We use best-reply dynamics [4] for prediction (i.e., to predict other bidders’ strategies), and then assuming fixed other-bidder strategies, we estimate and solve the ensuing Markov decision processes (MDP) [18] for optimization. We exploit auction properties to represent the MDP in a more compact state space, and we use Monte Carlo simulation to make estimating the MDP tractable. We show how equilibria found using our search procedure compare to known equilibria for simpler auction domains, and we approximate an equilibrium for a more complex auction domain where analytical solutions are unknown. 1

2 0.1296289 297 nips-2012-Robustness and risk-sensitivity in Markov decision processes

Author: Takayuki Osogami

Abstract: We uncover relations between robust MDPs and risk-sensitive MDPs. The objective of a robust MDP is to minimize a function, such as the expectation of cumulative cost, for the worst case when the parameters have uncertainties. The objective of a risk-sensitive MDP is to minimize a risk measure of the cumulative cost when the parameters are known. We show that a risk-sensitive MDP of minimizing the expected exponential utility is equivalent to a robust MDP of minimizing the worst-case expectation with a penalty for the deviation of the uncertain parameters from their nominal values, which is measured with the Kullback-Leibler divergence. We also show that a risk-sensitive MDP of minimizing an iterated risk measure that is composed of certain coherent risk measures is equivalent to a robust MDP of minimizing the worst-case expectation when the possible deviations of uncertain parameters from their nominal values are characterized with a concave function. 1

3 0.11633229 348 nips-2012-Tractable Objectives for Robust Policy Optimization

Author: Katherine Chen, Michael Bowling

Abstract: Robust policy optimization acknowledges that risk-aversion plays a vital role in real-world decision-making. When faced with uncertainty about the effects of actions, the policy that maximizes expected utility over the unknown parameters of the system may also carry with it a risk of intolerably poor performance. One might prefer to accept lower utility in expectation in order to avoid, or reduce the likelihood of, unacceptable levels of utility under harmful parameter realizations. In this paper, we take a Bayesian approach to parameter uncertainty, but unlike other methods avoid making any distributional assumptions about the form of this uncertainty. Instead we focus on identifying optimization objectives for which solutions can be efficiently approximated. We introduce percentile measures: a very general class of objectives for robust policy optimization, which encompasses most existing approaches, including ones known to be intractable. We then introduce a broad subclass of this family for which robust policies can be approximated efficiently. Finally, we frame these objectives in the context of a two-player, zero-sum, extensive-form game and employ a no-regret algorithm to approximate an optimal policy, with computation only polynomial in the number of states and actions of the MDP. 1

4 0.088897631 109 nips-2012-Efficient Monte Carlo Counterfactual Regret Minimization in Games with Many Player Actions

Author: Neil Burch, Marc Lanctot, Duane Szafron, Richard G. Gibson

Abstract: Counterfactual Regret Minimization (CFR) is a popular, iterative algorithm for computing strategies in extensive-form games. The Monte Carlo CFR (MCCFR) variants reduce the per iteration time cost of CFR by traversing a smaller, sampled portion of the tree. The previous most effective instances of MCCFR can still be very slow in games with many player actions since they sample every action for a given player. In this paper, we present a new MCCFR algorithm, Average Strategy Sampling (AS), that samples a subset of the player’s actions according to the player’s average strategy. Our new algorithm is inspired by a new, tighter bound on the number of iterations required by CFR to converge to a given solution quality. In addition, we prove a similar, tighter bound for AS and other popular MCCFR variants. Finally, we validate our work by demonstrating that AS converges faster than previous MCCFR algorithms in both no-limit poker and Bluff. 1

5 0.076767392 120 nips-2012-Exact and Stable Recovery of Sequences of Signals with Sparse Increments via Differential 1-Minimization

Author: Demba Ba, Behtash Babadi, Patrick Purdon, Emery Brown

Abstract: We consider the problem of recovering a sequence of vectors, (xk )K , for which k=0 the increments xk − xk−1 are Sk -sparse (with Sk typically smaller than S1 ), based on linear measurements (yk = Ak xk + ek )K , where Ak and ek denote the meak=1 surement matrix and noise, respectively. Assuming each Ak obeys the restricted isometry property (RIP) of a certain order—depending only on Sk —we show that in the absence of noise a convex program, which minimizes the weighted sum of the ℓ1 -norm of successive differences subject to the linear measurement constraints, recovers the sequence (xk )K exactly. This is an interesting result bek=1 cause this convex program is equivalent to a standard compressive sensing problem with a highly-structured aggregate measurement matrix which does not satisfy the RIP requirements in the standard sense, and yet we can achieve exact recovery. In the presence of bounded noise, we propose a quadratically-constrained convex program for recovery and derive bounds on the reconstruction error of the sequence. We supplement our theoretical analysis with simulations and an application to real video data. These further support the validity of the proposed approach for acquisition and recovery of signals with time-varying sparsity.

6 0.061245408 108 nips-2012-Efficient Bayes-Adaptive Reinforcement Learning using Sample-Based Search

7 0.054310147 3 nips-2012-A Bayesian Approach for Policy Learning from Trajectory Preference Queries

8 0.054199889 38 nips-2012-Algorithms for Learning Markov Field Policies

9 0.052283168 184 nips-2012-Learning Probability Measures with respect to Optimal Transport Metrics

10 0.052024402 122 nips-2012-Exploration in Model-based Reinforcement Learning by Empirically Estimating Learning Progress

11 0.051367 296 nips-2012-Risk Aversion in Markov Decision Processes via Near Optimal Chernoff Bounds

12 0.050953686 353 nips-2012-Transferring Expectations in Model-based Reinforcement Learning

13 0.046436131 161 nips-2012-Interpreting prediction markets: a stochastic approach

14 0.044740561 259 nips-2012-Online Regret Bounds for Undiscounted Continuous Reinforcement Learning

15 0.044677485 153 nips-2012-How Prior Probability Influences Decision Making: A Unifying Probabilistic Model

16 0.044676948 27 nips-2012-A quasi-Newton proximal splitting method

17 0.042419352 88 nips-2012-Cost-Sensitive Exploration in Bayesian Reinforcement Learning

18 0.040610209 162 nips-2012-Inverse Reinforcement Learning through Structured Classification

19 0.039645113 173 nips-2012-Learned Prioritization for Trading Off Accuracy and Speed

20 0.03941372 22 nips-2012-A latent factor model for highly multi-relational data


similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.092), (1, -0.119), (2, 0.011), (3, -0.018), (4, -0.015), (5, 0.021), (6, 0.001), (7, -0.005), (8, 0.033), (9, 0.021), (10, -0.02), (11, 0.01), (12, -0.016), (13, -0.029), (14, -0.046), (15, -0.021), (16, 0.014), (17, 0.001), (18, 0.011), (19, -0.001), (20, -0.049), (21, 0.036), (22, 0.009), (23, 0.016), (24, 0.032), (25, 0.014), (26, 0.023), (27, 0.089), (28, 0.011), (29, 0.113), (30, 0.01), (31, 0.008), (32, -0.118), (33, 0.098), (34, -0.019), (35, 0.083), (36, -0.086), (37, 0.014), (38, 0.076), (39, 0.068), (40, 0.022), (41, 0.011), (42, 0.108), (43, -0.02), (44, -0.095), (45, 0.032), (46, 0.006), (47, -0.095), (48, -0.055), (49, 0.08)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.94047147 45 nips-2012-Approximating Equilibria in Sequential Auctions with Incomplete Information and Multi-Unit Demand

Author: Amy Greenwald, Jiacui Li, Eric Sodomka

Abstract: In many large economic markets, goods are sold through sequential auctions. Examples include eBay, online ad auctions, wireless spectrum auctions, and the Dutch flower auctions. In this paper, we combine methods from game theory and decision theory to search for approximate equilibria in sequential auction domains, in which bidders do not know their opponents’ values for goods, bidders only partially observe the actions of their opponents’, and bidders demand multiple goods. We restrict attention to two-phased strategies: first predict (i.e., learn); second, optimize. We use best-reply dynamics [4] for prediction (i.e., to predict other bidders’ strategies), and then assuming fixed other-bidder strategies, we estimate and solve the ensuing Markov decision processes (MDP) [18] for optimization. We exploit auction properties to represent the MDP in a more compact state space, and we use Monte Carlo simulation to make estimating the MDP tractable. We show how equilibria found using our search procedure compare to known equilibria for simpler auction domains, and we approximate an equilibrium for a more complex auction domain where analytical solutions are unknown. 1

2 0.70485169 297 nips-2012-Robustness and risk-sensitivity in Markov decision processes

Author: Takayuki Osogami

Abstract: We uncover relations between robust MDPs and risk-sensitive MDPs. The objective of a robust MDP is to minimize a function, such as the expectation of cumulative cost, for the worst case when the parameters have uncertainties. The objective of a risk-sensitive MDP is to minimize a risk measure of the cumulative cost when the parameters are known. We show that a risk-sensitive MDP of minimizing the expected exponential utility is equivalent to a robust MDP of minimizing the worst-case expectation with a penalty for the deviation of the uncertain parameters from their nominal values, which is measured with the Kullback-Leibler divergence. We also show that a risk-sensitive MDP of minimizing an iterated risk measure that is composed of certain coherent risk measures is equivalent to a robust MDP of minimizing the worst-case expectation when the possible deviations of uncertain parameters from their nominal values are characterized with a concave function. 1

3 0.62739092 348 nips-2012-Tractable Objectives for Robust Policy Optimization

Author: Katherine Chen, Michael Bowling

Abstract: Robust policy optimization acknowledges that risk-aversion plays a vital role in real-world decision-making. When faced with uncertainty about the effects of actions, the policy that maximizes expected utility over the unknown parameters of the system may also carry with it a risk of intolerably poor performance. One might prefer to accept lower utility in expectation in order to avoid, or reduce the likelihood of, unacceptable levels of utility under harmful parameter realizations. In this paper, we take a Bayesian approach to parameter uncertainty, but unlike other methods avoid making any distributional assumptions about the form of this uncertainty. Instead we focus on identifying optimization objectives for which solutions can be efficiently approximated. We introduce percentile measures: a very general class of objectives for robust policy optimization, which encompasses most existing approaches, including ones known to be intractable. We then introduce a broad subclass of this family for which robust policies can be approximated efficiently. Finally, we frame these objectives in the context of a two-player, zero-sum, extensive-form game and employ a no-regret algorithm to approximate an optimal policy, with computation only polynomial in the number of states and actions of the MDP. 1

4 0.61166257 109 nips-2012-Efficient Monte Carlo Counterfactual Regret Minimization in Games with Many Player Actions

Author: Neil Burch, Marc Lanctot, Duane Szafron, Richard G. Gibson

Abstract: Counterfactual Regret Minimization (CFR) is a popular, iterative algorithm for computing strategies in extensive-form games. The Monte Carlo CFR (MCCFR) variants reduce the per iteration time cost of CFR by traversing a smaller, sampled portion of the tree. The previous most effective instances of MCCFR can still be very slow in games with many player actions since they sample every action for a given player. In this paper, we present a new MCCFR algorithm, Average Strategy Sampling (AS), that samples a subset of the player’s actions according to the player’s average strategy. Our new algorithm is inspired by a new, tighter bound on the number of iterations required by CFR to converge to a given solution quality. In addition, we prove a similar, tighter bound for AS and other popular MCCFR variants. Finally, we validate our work by demonstrating that AS converges faster than previous MCCFR algorithms in both no-limit poker and Bluff. 1

5 0.60764766 250 nips-2012-On-line Reinforcement Learning Using Incremental Kernel-Based Stochastic Factorization

Author: Doina Precup, Joelle Pineau, Andre S. Barreto

Abstract: Kernel-based stochastic factorization (KBSF) is an algorithm for solving reinforcement learning tasks with continuous state spaces which builds a Markov decision process (MDP) based on a set of sample transitions. What sets KBSF apart from other kernel-based approaches is the fact that the size of its MDP is independent of the number of transitions, which makes it possible to control the trade-off between the quality of the resulting approximation and the associated computational cost. However, KBSF’s memory usage grows linearly with the number of transitions, precluding its application in scenarios where a large amount of data must be processed. In this paper we show that it is possible to construct KBSF’s MDP in a fully incremental way, thus freeing the space complexity of this algorithm from its dependence on the number of sample transitions. The incremental version of KBSF is able to process an arbitrary amount of data, which results in a model-based reinforcement learning algorithm that can be used to solve continuous MDPs in both off-line and on-line regimes. We present theoretical results showing that KBSF can approximate the value function that would be computed by conventional kernel-based learning with arbitrary precision. We empirically demonstrate the effectiveness of the proposed algorithm in the challenging threepole balancing task, in which the ability to process a large number of transitions is crucial for success. 1

6 0.58888739 296 nips-2012-Risk Aversion in Markov Decision Processes via Near Optimal Chernoff Bounds

7 0.54148549 88 nips-2012-Cost-Sensitive Exploration in Bayesian Reinforcement Learning

8 0.4911105 108 nips-2012-Efficient Bayes-Adaptive Reinforcement Learning using Sample-Based Search

9 0.4668563 350 nips-2012-Trajectory-Based Short-Sighted Probabilistic Planning

10 0.46269944 184 nips-2012-Learning Probability Measures with respect to Optimal Transport Metrics

11 0.43950438 179 nips-2012-Learning Manifolds with K-Means and K-Flats

12 0.41678178 122 nips-2012-Exploration in Model-based Reinforcement Learning by Empirically Estimating Learning Progress

13 0.41356122 153 nips-2012-How Prior Probability Influences Decision Making: A Unifying Probabilistic Model

14 0.41308975 51 nips-2012-Bayesian Hierarchical Reinforcement Learning

15 0.40133372 288 nips-2012-Rational inference of relative preferences

16 0.39566523 128 nips-2012-Fast Resampling Weighted v-Statistics

17 0.38982874 243 nips-2012-Non-parametric Approximate Dynamic Programming via the Kernel Method

18 0.38189718 161 nips-2012-Interpreting prediction markets: a stochastic approach

19 0.37776273 31 nips-2012-Action-Model Based Multi-agent Plan Recognition

20 0.36686474 110 nips-2012-Efficient Reinforcement Learning for High Dimensional Linear Quadratic Systems


similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(0, 0.021), (21, 0.02), (27, 0.018), (38, 0.102), (42, 0.019), (45, 0.365), (54, 0.046), (55, 0.017), (74, 0.031), (76, 0.106), (80, 0.102), (92, 0.023)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.74889696 45 nips-2012-Approximating Equilibria in Sequential Auctions with Incomplete Information and Multi-Unit Demand

Author: Amy Greenwald, Jiacui Li, Eric Sodomka

Abstract: In many large economic markets, goods are sold through sequential auctions. Examples include eBay, online ad auctions, wireless spectrum auctions, and the Dutch flower auctions. In this paper, we combine methods from game theory and decision theory to search for approximate equilibria in sequential auction domains, in which bidders do not know their opponents’ values for goods, bidders only partially observe the actions of their opponents’, and bidders demand multiple goods. We restrict attention to two-phased strategies: first predict (i.e., learn); second, optimize. We use best-reply dynamics [4] for prediction (i.e., to predict other bidders’ strategies), and then assuming fixed other-bidder strategies, we estimate and solve the ensuing Markov decision processes (MDP) [18] for optimization. We exploit auction properties to represent the MDP in a more compact state space, and we use Monte Carlo simulation to make estimating the MDP tractable. We show how equilibria found using our search procedure compare to known equilibria for simpler auction domains, and we approximate an equilibrium for a more complex auction domain where analytical solutions are unknown. 1

2 0.59830993 220 nips-2012-Monte Carlo Methods for Maximum Margin Supervised Topic Models

Author: Qixia Jiang, Jun Zhu, Maosong Sun, Eric P. Xing

Abstract: An effective strategy to exploit the supervising side information for discovering predictive topic representations is to impose discriminative constraints induced by such information on the posterior distributions under a topic model. This strategy has been adopted by a number of supervised topic models, such as MedLDA, which employs max-margin posterior constraints. However, unlike the likelihoodbased supervised topic models, of which posterior inference can be carried out using the Bayes’ rule, the max-margin posterior constraints have made Monte Carlo methods infeasible or at least not directly applicable, thereby limited the choice of inference algorithms to be based on variational approximation with strict mean field assumptions. In this paper, we develop two efficient Monte Carlo methods under much weaker assumptions for max-margin supervised topic models based on an importance sampler and a collapsed Gibbs sampler, respectively, in a convex dual formulation. We report thorough experimental results that compare our approach favorably against existing alternatives in both accuracy and efficiency.

3 0.58048391 271 nips-2012-Pointwise Tracking the Optimal Regression Function

Author: Yair Wiener, Ran El-Yaniv

Abstract: This paper examines the possibility of a ‘reject option’ in the context of least squares regression. It is shown that using rejection it is theoretically possible to learn ‘selective’ regressors that can ǫ-pointwise track the best regressor in hindsight from the same hypothesis class, while rejecting only a bounded portion of the domain. Moreover, the rejected volume vanishes with the training set size, under certain conditions. We then develop efficient and exact implementation of these selective regressors for the case of linear regression. Empirical evaluation over a suite of real-world datasets corroborates the theoretical analysis and indicates that our selective regressors can provide substantial advantage by reducing estimation error.

4 0.55970091 292 nips-2012-Regularized Off-Policy TD-Learning

Author: Bo Liu, Sridhar Mahadevan, Ji Liu

Abstract: We present a novel l1 regularized off-policy convergent TD-learning method (termed RO-TD), which is able to learn sparse representations of value functions with low computational complexity. The algorithmic framework underlying ROTD integrates two key ideas: off-policy convergent gradient TD methods, such as TDC, and a convex-concave saddle-point formulation of non-smooth convex optimization, which enables first-order solvers and feature selection using online convex regularization. A detailed theoretical and experimental analysis of RO-TD is presented. A variety of experiments are presented to illustrate the off-policy convergence, sparse feature selection capability and low computational cost of the RO-TD algorithm. 1

5 0.54988962 330 nips-2012-Supervised Learning with Similarity Functions

Author: Purushottam Kar, Prateek Jain

Abstract: We address the problem of general supervised learning when data can only be accessed through an (indefinite) similarity function between data points. Existing work on learning with indefinite kernels has concentrated solely on binary/multiclass classification problems. We propose a model that is generic enough to handle any supervised learning task and also subsumes the model previously proposed for classification. We give a “goodness” criterion for similarity functions w.r.t. a given supervised learning task and then adapt a well-known landmarking technique to provide efficient algorithms for supervised learning using “good” similarity functions. We demonstrate the effectiveness of our model on three important supervised learning problems: a) real-valued regression, b) ordinal regression and c) ranking where we show that our method guarantees bounded generalization error. Furthermore, for the case of real-valued regression, we give a natural goodness definition that, when used in conjunction with a recent result in sparse vector recovery, guarantees a sparse predictor with bounded generalization error. Finally, we report results of our learning algorithms on regression and ordinal regression tasks using non-PSD similarity functions and demonstrate the effectiveness of our algorithms, especially that of the sparse landmark selection algorithm that achieves significantly higher accuracies than the baseline methods while offering reduced computational costs. 1

6 0.45537961 173 nips-2012-Learned Prioritization for Trading Off Accuracy and Speed

7 0.45434901 108 nips-2012-Efficient Bayes-Adaptive Reinforcement Learning using Sample-Based Search

8 0.4540906 287 nips-2012-Random function priors for exchangeable arrays with applications to graphs and relational data

9 0.45406294 56 nips-2012-Bayesian active learning with localized priors for fast receptive field characterization

10 0.45402128 83 nips-2012-Controlled Recognition Bounds for Visual Learning and Exploration

11 0.45367491 186 nips-2012-Learning as MAP Inference in Discrete Graphical Models

12 0.45326433 279 nips-2012-Projection Retrieval for Classification

13 0.45282349 252 nips-2012-On Multilabel Classification and Ranking with Partial Feedback

14 0.45241061 353 nips-2012-Transferring Expectations in Model-based Reinforcement Learning

15 0.45216522 200 nips-2012-Local Supervised Learning through Space Partitioning

16 0.45216122 160 nips-2012-Imitation Learning by Coaching

17 0.45141336 218 nips-2012-Mixing Properties of Conditional Markov Chains with Unbounded Feature Functions

18 0.45129365 162 nips-2012-Inverse Reinforcement Learning through Structured Classification

19 0.45080477 122 nips-2012-Exploration in Model-based Reinforcement Learning by Empirically Estimating Learning Progress

20 0.45066696 355 nips-2012-Truncation-free Online Variational Inference for Bayesian Nonparametric Models