nips nips2013 nips2013-159 nips2013-159-reference knowledge-graph by maker-knowledge-mining

159 nips-2013-Learning Prices for Repeated Auctions with Strategic Buyers

Source: pdf

Author: Kareem Amin, Afshin Rostamizadeh, Umar Syed

Abstract: Inspired by real-time ad exchanges for online display advertising, we consider the problem of inferring a buyer’s value distribution for a good when the buyer is repeatedly interacting with a seller through a posted-price mechanism. We model the buyer as a strategic agent, whose goal is to maximize her long-term surplus, and we are interested in mechanisms that maximize the seller’s long-term revenue. We deﬁne the natural notion of strategic regret — the lost revenue as measured against a truthful (non-strategic) buyer. We present seller algorithms that are no(strategic)-regret when the buyer discounts her future surplus — i.e. the buyer prefers showing advertisements to users sooner rather than later. We also give a lower bound on strategic regret that increases as the buyer’s discounting weakens and shows, in particular, that any seller algorithm will suffer linear strategic regret if there is no discounting. 1

reference text

[1] Alessandro Acquisti and Hal R. Varian. Conditioning prices on purchase history. Marketing Science, 24(3):367–381, 2005.

[2] Raman Arora, Ofer Dekel, and Ambuj Tewari. Online bandit learning against an adaptive adversary: from regret to policy regret. In ICML, 2012.

[3] Peter Auer, Nicol` Cesa-Bianchi, and Paul Fischer. Finite-time analysis of the multiarmed o bandit problem. Machine learning, 47(2-3):235–256, 2002.

[4] Peter Auer, Nicolo Cesa-Bianchi, Yoav Freund, and Robert E Schapire. The nonstochastic multiarmed bandit problem. Journal on Computing, 32(1):48–77, 2002.

[5] Moshe Babaioff, Robert D Kleinberg, and Aleksandrs Slivkins. Truthful mechanisms with implicit payment computation. In Proceedings of the Conference on Electronic Commerce, pages 43–52. ACM, 2010.

[6] Moshe Babaioff, Yogeshwer Sharma, and Aleksandrs Slivkins. Characterizing truthful multiarmed bandit mechanisms. In Proceedings of Conference on Electronic Commerce, pages 79–88. ACM, 2009.

[7] Ziv Bar-Yossef, Kirsten Hildrum, and Felix Wu. Incentive-compatible online auctions for digital goods. In Proceedings of Symposium on Discrete Algorithms, pages 964–970. SIAM, 2002.

[8] Avrim Blum, Vijay Kumar, Atri Rudra, and Felix Wu. Online learning in online auctions. In Proceedings Symposium on Discrete algorithms, pages 202–204. SIAM, 2003.

[9] Nicolo Cesa-Bianchi, Claudio Gentile, and Yishay Mansour. Regret minimization for reserve prices in second-price auctions. In Proceedings of the Symposium on Discrete Algorithms. SIAM, 2013.

[10] Ofer Dekel, Felix Fischer, and Ariel D Procaccia. Incentive compatible regression learning. Journal of Computer and System Sciences, 76(8):759–777, 2010.

[11] Nikhil R Devanur and Sham M Kakade. The price of truthfulness for pay-per-click auctions. In Proceedings of the Conference on Electronic commerce, pages 99–106. ACM, 2009.

[12] Benjamin Edelman and Michael Ostrovsky. Strategic bidder behavior in sponsored search auctions. Decision support systems, 43(1):192–198, 2007.

[13] Drew Fudenberg and J. Miguel Villas-Boas. Behavior-Based Price Discrimination and Customer Recognition. Elsevier Science, Oxford, 2007.

[14] Jason Hartline. Dynamic posted price mechanisms, 2001.

[15] Robert Kleinberg and Tom Leighton. The value of knowing a demand curve: Bounds on regret for online posted-price auctions. In Symposium on Foundations of Computer Science, pages 594–605. IEEE, 2003.

[16] Volodymyr Kuleshov and Doina Precup. Algorithms for the multi-armed bandit problem. Journal of Machine Learning, 2010.

[17] Reshef Meir, Ariel D Procaccia, and Jeffrey S Rosenschein. Strategyproof classiﬁcation with shared inputs. Proc. of 21st IJCAI, pages 220–225, 2009.

[18] Herbert Robbins. Some aspects of the sequential design of experiments. In Herbert Robbins Selected Papers, pages 169–177. Springer, 1985. 9