nips nips2013 nips2013-112 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Min Xu, Tao Qin, Tie-Yan Liu
Abstract: In search advertising, the search engine needs to select the most profitable advertisements to display, which can be formulated as an instance of online learning with partial feedback, also known as the stochastic multi-armed bandit (MAB) problem. In this paper, we show that the naive application of MAB algorithms to search advertising for advertisement selection will produce sample selection bias that harms the search engine by decreasing expected revenue and “estimation of the largest mean” (ELM) bias that harms the advertisers by increasing game-theoretic player-regret. We then propose simple bias-correction methods with benefits to both the search engine and the advertisers. 1
Reference: text
sentIndex sentText sentNum sentScore
1 com Abstract In search advertising, the search engine needs to select the most profitable advertisements to display, which can be formulated as an instance of online learning with partial feedback, also known as the stochastic multi-armed bandit (MAB) problem. [sent-6, score-0.349]
2 We then propose simple bias-correction methods with benefits to both the search engine and the advertisers. [sent-8, score-0.206]
3 1 Introduction Search advertising, also known as sponsored search, has been formulated as a multi-armed bandit (MAB) problem [11], in which the search engine needs to choose one ad from a pool of candidate to maximize some objective (e. [sent-9, score-0.469]
4 To select the best ad from the pool, one needs to know the quality of each ad, which is usually measured by the probability that a random user will click on the ad. [sent-12, score-0.421]
5 Stochastic MAB algorithms provide an attractive way to select the high quality ads, and the regret guarantee on MAB algorithms ensures that we do not display the low quality ads too many times. [sent-13, score-0.259]
6 If the probabilities are estimated poorly, the search engine may charge too low a payment to the advertisers and lose revenue, or it may charge too high a payment which would encourage the advertisers to engage in strategic behavior. [sent-15, score-0.973]
7 However, most existing MAB algorithms only focus on the identification of the best arm; if naively applied to search advertising, there is no guarantee to get an accurate estimation for the click probabilities of the top two ads. [sent-16, score-0.272]
8 We show in particular that naive ways of combining click probability estimation and MAB algorithms lead to sample selection bias that harms the search engine’s revenue. [sent-19, score-0.468]
9 We present a simple modification to MAB algorithms that eliminates such a bias and provably achieves almost the revenue as if an oracle gives us the actual click probabilities. [sent-20, score-0.664]
10 We also analyze the game theoretic notion of incentive compatibility (IC) 1 and show that low regret MAB algorithms may have worse IC property than high regret uniform exploration algorithms and that a trade-off may be required. [sent-21, score-0.45]
11 2 Setting Each time an user visits a webpage, which we call an impression, the search engine runs a generalized second price (SP) auction [6] to determine which ads to show to the user and how much to charge advertisers if their ads are clicked. [sent-22, score-0.918]
12 We will in this paper suppose that we have only one ad slot in which we can display one ad. [sent-23, score-0.332]
13 In the single slot case, generalized SP auction becomes simply the well known second price auction, which we describe below. [sent-25, score-0.284]
14 Let bk denote the bid of advertiser k (or the ad k), which is the maximum amount of money advertiser k is willing to pay for a click, and ρk denote the click-through-rate (CTR) of ad k, which is the probability a random user will click on it. [sent-27, score-1.543]
15 SP auction ranks ads according to the products of the ad CTRs and bids. [sent-28, score-0.452]
16 Assume that advertisers are numbered by the decreasing order of bi ρi : b1 ρ1 > b2 ρ2 > · · · > bn ρn . [sent-29, score-0.225]
17 Then advertiser 1 wins the ad slot, and he/she need to pay b2 ρ2 /ρ1 for each click on his/her ad. [sent-30, score-0.676]
18 This payment formula is chosen to satisfy the game theoretic notion of incentive compatibility (see Chapter 9 of [10] for a good introduction). [sent-31, score-0.387]
19 Therefore, the per-impress expected revenue of SP auction is b2 ρ2 . [sent-32, score-0.587]
20 1 A Two-Stage Framework Since the CTRs are unknown to both advertisers and the search engine, the search engine needs to estimate them through some learning process. [sent-34, score-0.479]
21 We adopt the same two-stage framework as in [12, 2], which is composed by a CTR learning stage lasting for the first T impressions and followed by a SP auction stage lasting for the second Tend − T impressions. [sent-35, score-0.404]
22 Estimate ρi based on the click records from previous stage. [sent-52, score-0.181]
23 , Tend , we run SP auction using estimators ρi : display ad that maximizes b ρ(2) bk ρk and charge (2)(1) . [sent-57, score-0.736]
24 Here we use (s) to indicate the ad with the s-th largest score bi ρi . [sent-58, score-0.204]
25 However, it is hard to compute a fair payment when we display ads based using a MAB algorithm rather than the SP auction, and a randomized payment is proposed in [2]. [sent-61, score-0.379]
26 Their scheme, though theoretically interesting, is impractical because it is difficult for advertisers to accept a randomized payment rule. [sent-62, score-0.317]
27 One criterion is to the per-impression expected revenue (defined below) in rounds T + 1, . [sent-65, score-0.464]
28 arg maxk bk ρk = arg max bk ρk , and (2) the estimators may be biased. [sent-71, score-0.697]
29 We do not analyze the revenue and incentive compatibility properties of the first CTR learning stage because of its complexity and brief duration; we assume that Tend >> T . [sent-73, score-0.701]
30 Let (1) := arg maxk bk ρk , (2) := arg maxk=(1) bk ρk . [sent-76, score-0.645]
31 We define the perb ρ(2) impression empirical revenue as rev := ρ(1) (2)(1) and the per-impression expected revenue as ρ E[rev] where the expectation is taken over the CTR estimators. [sent-77, score-1.008]
32 We define then the per-impression expected revenue loss as b2 ρ2 − E[rev], where b2 ρ2 is the oracle revenue we obtain if we know the true click probabilities. [sent-78, score-1.036]
33 2 k Choice of Estimator We will analyze the most straightforward estimator ρk = Ck where Tk is T the number of impression allocated to ad k in the CTR learning stage and Ck is the number of clicks received by ad k in the CTR learning stage. [sent-79, score-0.612]
34 Because there are many specific algorithms for each class, we give our formal definitions by characterizing Tk , the number of impressions assigned to each advertiser k at the end of the CTR learning stage. [sent-83, score-0.402]
35 n We next describe adaptive algorithm which has low regret because it stops allocating impressions to ad k if it is certain that bk ρk < maxk bk ρk . [sent-87, score-0.981]
36 We say that a MAB algorithm is adaptive with respect to b, n if, with probability at least 1 − O T , we have that: T1 ≥ cTmax and c b2 k ln T ∆2 k ≥ Tk ≥ min cTmax , 4b2 k ln T ∆2 k for all k = 1 where ∆k = b1 ρ1 − bk ρk and c < 1, c are positive constants and Tmax = maxk Tk . [sent-91, score-0.431]
37 Both the uniform algorithms and the adaptive algorithms have been used in the search advertising auctions [5, 7, 12, 2, 8]. [sent-94, score-0.369]
38 The UCB algorithm, at round t, allocate the impression to the ad with the largest score, which is defined as sk,t ≡ bk ρk,t + γbk Tk1 log T . [sent-99, score-0.614]
39 (t) where Tk (t) is the number of impressions ad k has received before round t and ρk,t is the number of clicks divided by Tk (t) in the history log before round t. [sent-100, score-0.391]
40 3 (1/Tk (t)) log T ≤ ρk,t ≤ ρk + (1/Tk (t)) log T Related Work As mentioned before, how to design incentive compatible payment rules when using MAB algorithms to select the best ads has been studied in [2] and [5]. [sent-138, score-0.512]
41 The idea of using MAB algorithms to simultaneously select ads and estimate click probabilities has proposed in [11], [8] and [13] . [sent-140, score-0.308]
42 [9] studies the effect of CTR learning on incentive compatibility from the perspective of an advertiser with imperfect information. [sent-143, score-0.526]
43 This work is only the first step towards understanding the effect of estimation bias in MAB algorithms for search advertising auctions, and we only focus on a relative simplified setting with only a single ad slot and without budget constraints, which is already difficult to analyze. [sent-144, score-0.547]
44 We leave the extensions to multiple ad slots and with budget constraints as future work. [sent-145, score-0.181]
45 We show that the direct plug-in of the estimators from a MAB algorithm (either unform or adaptive) will cause the sample selection bias and damage the search engine’s revenue; we then propose a simple de-bias method which can ensure the revenue guarantee. [sent-147, score-0.708]
46 We define the notations (1), (2) as (1) := arg maxk ρk bk and (2) := arg maxk=(1) ρk bk . [sent-149, score-0.645]
47 If ρk > ρk , then the UCB algorithm will select k more often and thus acquire more click data to gradually correct the overestimation. [sent-155, score-0.199]
48 1 Revenue Analysis The following theorem is the main result of this section, which shows that the bias of the CTR estimators can critically affect the search engine’s revenue. [sent-159, score-0.204]
49 Let T0 := 4 nb2 max c∆2 2 4n ρ2 1 adpt log T , Tmin := 5c k=1 max(b2 ,b2 ) 1 k ∆2 k unif log T , and Tmin := log T . [sent-162, score-0.275]
50 adpt unif If we use adaptive algorithms and T ≥ Tmin or if we use uniform algorithms and T ≥ Tmin , then b2 ρ2 − E[rev] ≤ (b2 ρ2 − b2 E[ρ2 ] ρ1 n )−O E[ρ1 ] T We leave the full proof to Section 5. [sent-167, score-0.286]
51 In the first adpt unif case where T is smaller than thresholds Tmin or Tmin , the probability of incorrect ranking, that is, incorrectly identifying the best ad, is high and we can only use concentration of measure bounds to control the revenue loss. [sent-169, score-0.608]
52 In the second case, we show that we can almost always identify the best ad n and therefore, the T log T error term disappears. [sent-170, score-0.217]
53 This bound suggests that the revenue loss does not converge to 0 as T increases. [sent-179, score-0.43]
54 Simulations in Section 5 show that our bound is in fact tight: the expected revenue loss for adaptive learning, in presence of sample selection bias, can be large and persistent. [sent-180, score-0.606]
55 For many common uniform learning algorithms (uniformly random selection for instance) sample selection bias does not exist and so the expected revenue loss is smaller. [sent-181, score-0.714]
56 This seems to suggest that, because of sample selection bias, adaptive algorithms are, from a revenue optimization perspective, inferior. [sent-182, score-0.557]
57 The picture is switched however if we use a debiasing technique such as the one we propose in section 3. [sent-183, score-0.228]
58 When sample selection bias is 0, adaptive algorithms yield better revenue because it is able to correctly identify the best advertisement with fewer rounds. [sent-185, score-0.677]
59 This condition does not hold in general, but we provide a simple debiasing procedure in Section 3. [sent-190, score-0.228]
60 adpt unif If we use adaptive algorithm and T ≥ Tmin or if we use uniform algorithm and T ≥ Tmin , then n b2 ρ2 − E[rev] ≤ O T The revenue loss guarantee is much stronger with the unbiasedness, which we confirm in our simulations in Section 5. [sent-200, score-0.735]
61 1 also shows that the revenue loss drops sharply from T log T to T once T is larger than some threshold. [sent-202, score-0.466]
62 Because the adaptive learning threshold adpt unif Tmin is always smaller and often much smaller than the uniform learning threshold Tmin , Corollary 3. [sent-204, score-0.286]
63 1 shows that adaptive learning can guarantee much lower revenue loss when T is between adpt unif Tmin and Tmin . [sent-205, score-0.692]
64 It is in fact the same adaptiveness that leads to low regret that also leads to the strong revenue loss guarantees for adaptive learning algorithms. [sent-206, score-0.539]
65 When the MAB algorithm requires estimators ρk ’s or click data to make an allocation, we will allow it access only to the first history log. [sent-212, score-0.284]
66 The estimator learned from the first history log is biased by the selection procedure but the heldout history log, since it does not influence the ad selection, can be used to output an unbiased estimator of each advertisement’s click probability at the end of the exploration stage. [sent-213, score-0.708]
67 Although this scheme doubles the learning length, sample selection debiasing can significantly improve the guarantee on expected revenue as shown in both theory and simulations. [sent-214, score-0.75]
68 We suppose that there exists a true value vi , which exactly measures how much a click is worth to advertiser i. [sent-219, score-0.541]
69 The utility per impression of advertiser i in the auction is then ρi (vi − pi ) if the ad i is displayed where pi is the per-click payment charged by the search engine charges. [sent-220, score-1.176]
70 An auction mechanism is called incentive compatible if the advertisers maximize their own utility by truthfully bidding: bi = vi . [sent-221, score-0.731]
71 For auctions that are close but not fully incentive compatible, we also define player-regret as the utility lost by advertiser i in truthfully bidding vi rather than a bid that optimizes utility. [sent-222, score-0.842]
72 For a fixed vector of competing bids b−k , we define the player utility as uk (bk ) ≡ max b ρ (b ) k k =k k k Ibk ρk (bk )≥bk ρk (bk )∀k vk ρk − ρk , where Ibk ρk (bk )≥bk ρk (bk )∀k is a 0/1 funcρk (bk ) tion indicating whether the impression is allocated to ad k. [sent-231, score-0.513]
73 We define the player-regret, with respect to a bid vector b, as the player’s optimal gain in utility through false bidding supb E[uk (bk )] − E[uk (vk )]. [sent-232, score-0.278]
74 Both expectations E[b(2) (v1 )ρ(2) (v1 )] and E[b(2) (b1 )ρ(2) (b1 )] can be larger than b2 ρ2 because the E[maxk=1 bk ρk (v1 )] ≥ maxk=1 bk E[ρk (v1 )]. [sent-251, score-0.496]
75 An omniscient advertiser 1, with the belief that v1 ρ1 >> b2 ρ2 , can thus increase his/her utility by underbidding to manipulate the learning algorithm to allocate more rounds to advertisers 2, . [sent-261, score-0.72]
76 Such a strategy will give advertiser 1 negative utility in the learning CTR learning stage, but it will yield positive utility in the longer SP auction stage and thus give an overall increase to the player utility. [sent-264, score-0.78]
77 Suppose we have n advertisers and b2 ρ2 = b3 ρ3 = . [sent-276, score-0.201]
78 Suppose that v1 ρ1 >> b2 ρ2 and we will show that advertiser 1 has the incentive to underbid. [sent-280, score-0.49]
79 Let ∆k (b1 ) ≡ b1 ρ1 − bk ρk , then ∆k (b1 )’s are the same for all k and ∆k (v1 ) >> 0 by previous supposition. [sent-281, score-0.248]
80 Suppose advertiser 1 bids b1 < v1 but where ∆k (b1 ) >> 0 still. [sent-282, score-0.364]
81 4 in the appendix, we know that E[b(2) (b1 )ρ(2) (b1 )] − b2 ρ2 ≤ log(n − 1) ≤ Tk log(n − 1) (b1 ρ1 − bk ρk ) log T (4. [sent-288, score-0.284]
82 We would like to at this point briefly compare our results with that of [9], which shows, under an imperfect information definition of utility, that advertisers have an incentive to overbid so that the their CTRs can be better learned by the search engine. [sent-298, score-0.449]
83 Our results are not contradictory since we show that only the leading advertiser have an incentive to underbid. [sent-299, score-0.49]
84 Figures 1a and 1b show the effect of sample selection debiasing (see Section 3, 3. [sent-318, score-0.306]
85 2) on the expected revenue where one uses adaptive learning. [sent-319, score-0.501]
86 1 in our experiment) One can see that selection bias harms the revenue but the debiasing method described in Section 3. [sent-321, score-0.829]
87 2, even though it holds out half of the click data, significantly lowers the expected revenue loss, as theoretically shown in Corollary 3. [sent-322, score-0.606]
88 Figure 1c shows that when there are a large number of poor quality ads, low regret adaptive algorithms indeed achieve better revenue in much fewer rounds of learning. [sent-325, score-0.572]
89 Figure 1d show the effect of estimation-of-the-largest-mean (ELM) bias on the utility gain of the advertiser. [sent-326, score-0.208]
90 1 and we see that without ELM debiasing, the advertiser can noticeably increase utility by underbidding. [sent-328, score-0.418]
91 We implement the ELM debiasing technique described in Section 4. [sent-329, score-0.228]
92 2; it does not completely address the problem since it does not completely reduce the bias (such a task has been proven impossible), but it does ameliorate the problem–the increase in utility from underbidding has decreased. [sent-330, score-0.213]
93 1 Expected Revenue Loss Expected Revenue Loss no selection debiasing with selection debiasing 0. [sent-333, score-0.578]
94 05 0 5000 10000 Rounds of Exploration no selection debiasing with selection debiasing 0. [sent-335, score-0.578]
95 1 no ELM debiasing with ELM debiasing uniform adaptive with debiasing 0. [sent-343, score-0.803]
96 9, 1, 2, 1} Figure 1: Simulation studies demonstrating effect of sample selection debiasing and ELM debiasing. [sent-369, score-0.306]
97 The revenue loss in figures a to c is relative and is measured by 1 − revenue ; negative loss indicate b2 ρ2 revenue improvement over oracle SP. [sent-370, score-1.263]
98 Figure d shows advertiser 1’s utility gain as a function of possible bids. [sent-371, score-0.442]
99 Utility gain utility(b) is relative and defined as utility(v) − 1; higher utility gain implies that advertiser 1 can benefit more from strategic bidding. [sent-373, score-0.489]
100 A truthful learning mechanism for contextual multi-slot o sponsored search auctions with externalities. [sent-411, score-0.215]
wordName wordTfidf (topN-words)
[('revenue', 0.403), ('advertiser', 0.314), ('mab', 0.269), ('bk', 0.248), ('debiasing', 0.228), ('advertisers', 0.201), ('tmin', 0.2), ('ad', 0.181), ('click', 0.181), ('incentive', 0.176), ('ctr', 0.171), ('elm', 0.171), ('auction', 0.162), ('engine', 0.134), ('tk', 0.128), ('advertising', 0.128), ('payment', 0.116), ('ucb', 0.11), ('ads', 0.109), ('maxk', 0.107), ('bid', 0.104), ('utility', 0.104), ('adpt', 0.1), ('impression', 0.093), ('impressions', 0.088), ('rev', 0.087), ('slot', 0.086), ('bias', 0.08), ('adaptive', 0.076), ('sp', 0.072), ('search', 0.072), ('unif', 0.067), ('selection', 0.061), ('harms', 0.057), ('charge', 0.055), ('stage', 0.054), ('sponsored', 0.052), ('estimators', 0.052), ('history', 0.051), ('auctions', 0.05), ('bids', 0.05), ('unbiasedness', 0.048), ('bidding', 0.046), ('ctrs', 0.043), ('uniform', 0.043), ('player', 0.042), ('advertisement', 0.04), ('rounds', 0.039), ('display', 0.038), ('exploration', 0.038), ('log', 0.036), ('compatibility', 0.036), ('estimator', 0.036), ('price', 0.036), ('clicks', 0.035), ('regret', 0.033), ('allocate', 0.033), ('analyze', 0.032), ('game', 0.032), ('bandit', 0.03), ('ctmax', 0.029), ('ibk', 0.029), ('truthfully', 0.029), ('underbidding', 0.029), ('loss', 0.027), ('suppose', 0.027), ('theoretic', 0.027), ('internet', 0.025), ('nition', 0.024), ('gain', 0.024), ('bn', 0.024), ('lasting', 0.023), ('advertisements', 0.023), ('damage', 0.023), ('strategic', 0.023), ('largest', 0.023), ('expected', 0.022), ('competing', 0.022), ('corollary', 0.022), ('commerce', 0.022), ('truthful', 0.022), ('compatible', 0.021), ('arg', 0.021), ('quality', 0.021), ('uk', 0.021), ('tend', 0.021), ('utilities', 0.021), ('incorrect', 0.02), ('asia', 0.02), ('user', 0.02), ('heldout', 0.019), ('vi', 0.019), ('guarantee', 0.019), ('mechanism', 0.019), ('concentration', 0.018), ('ic', 0.018), ('ranking', 0.018), ('select', 0.018), ('unbiased', 0.018), ('sample', 0.017)]
simIndex simValue paperId paperTitle
same-paper 1 0.99999952 112 nips-2013-Estimation Bias in Multi-Armed Bandit Algorithms for Search Advertising
Author: Min Xu, Tao Qin, Tie-Yan Liu
Abstract: In search advertising, the search engine needs to select the most profitable advertisements to display, which can be formulated as an instance of online learning with partial feedback, also known as the stochastic multi-armed bandit (MAB) problem. In this paper, we show that the naive application of MAB algorithms to search advertising for advertisement selection will produce sample selection bias that harms the search engine by decreasing expected revenue and “estimation of the largest mean” (ELM) bias that harms the advertisers by increasing game-theoretic player-regret. We then propose simple bias-correction methods with benefits to both the search engine and the advertisers. 1
2 0.12383409 159 nips-2013-Learning Prices for Repeated Auctions with Strategic Buyers
Author: Kareem Amin, Afshin Rostamizadeh, Umar Syed
Abstract: Inspired by real-time ad exchanges for online display advertising, we consider the problem of inferring a buyer’s value distribution for a good when the buyer is repeatedly interacting with a seller through a posted-price mechanism. We model the buyer as a strategic agent, whose goal is to maximize her long-term surplus, and we are interested in mechanisms that maximize the seller’s long-term revenue. We define the natural notion of strategic regret — the lost revenue as measured against a truthful (non-strategic) buyer. We present seller algorithms that are no(strategic)-regret when the buyer discounts her future surplus — i.e. the buyer prefers showing advertisements to users sooner rather than later. We also give a lower bound on strategic regret that increases as the buyer’s discounting weakens and shows, in particular, that any seller algorithm will suffer linear strategic regret if there is no discounting. 1
3 0.089335442 95 nips-2013-Distributed Exploration in Multi-Armed Bandits
Author: Eshcar Hillel, Zohar S. Karnin, Tomer Koren, Ronny Lempel, Oren Somekh
Abstract: We study exploration in Multi-Armed Bandits in a setting where k players collaborate in order to identify an ε-optimal arm. Our motivation comes from recent employment of bandit algorithms in computationally intensive, large-scale applications. Our results demonstrate a non-trivial tradeoff between the number of arm pulls required by each of the players, and the amount of communication between them. In particular, our main result shows that by allowing the k players to com√ municate only once, they are able to learn k times faster than a single player. √ That is, distributing learning to k players gives rise to a factor k parallel speedup. We complement this result with a lower bound showing this is in general the best possible. On the other extreme, we present an algorithm that achieves the ideal factor k speed-up in learning performance, with communication only logarithmic in 1/ε. 1
4 0.08641883 106 nips-2013-Eluder Dimension and the Sample Complexity of Optimistic Exploration
Author: Dan Russo, Benjamin Van Roy
Abstract: This paper considers the sample complexity of the multi-armed bandit with dependencies among the arms. Some of the most successful algorithms for this problem use the principle of optimism in the face of uncertainty to guide exploration. The clearest example of this is the class of upper confidence bound (UCB) algorithms, but recent work has shown that a simple posterior sampling algorithm, sometimes called Thompson sampling, can be analyzed in the same manner as optimistic approaches. In this paper, we develop a regret bound that holds for both classes of algorithms. This bound applies broadly and can be specialized to many model classes. It depends on a new notion we refer to as the eluder dimension, which measures the degree of dependence among action rewards. Compared to UCB algorithm regret bounds for specific model classes, our general bound matches the best available for linear models and is stronger than the best available for generalized linear models. 1
5 0.079385132 318 nips-2013-Structured Learning via Logistic Regression
Author: Justin Domke
Abstract: A successful approach to structured learning is to write the learning objective as a joint function of linear parameters and inference messages, and iterate between updates to each. This paper observes that if the inference problem is “smoothed” through the addition of entropy terms, for fixed messages, the learning objective reduces to a traditional (non-structured) logistic regression problem with respect to parameters. In these logistic regression problems, each training example has a bias term determined by the current set of messages. Based on this insight, the structured energy function can be extended from linear factors to any function class where an “oracle” exists to minimize a logistic loss.
6 0.077326603 137 nips-2013-High-Dimensional Gaussian Process Bandits
7 0.068082243 41 nips-2013-Approximate inference in latent Gaussian-Markov models from continuous time observations
8 0.064756766 292 nips-2013-Sequential Transfer in Multi-armed Bandit with Finite Set of Models
9 0.060571086 231 nips-2013-Online Learning with Switching Costs and Other Adaptive Adversaries
10 0.054396015 301 nips-2013-Sparse Additive Text Models with Low Rank Background
11 0.054181714 100 nips-2013-Dynamic Clustering via Asymptotics of the Dependent Dirichlet Process Mixture
12 0.052562963 54 nips-2013-Bayesian optimization explains human active search
13 0.050408494 230 nips-2013-Online Learning with Costly Features and Labels
14 0.043662146 191 nips-2013-Minimax Optimal Algorithms for Unconstrained Linear Optimization
15 0.038438216 50 nips-2013-Bayesian Mixture Modelling and Inference based Thompson Sampling in Monte-Carlo Tree Search
16 0.038125649 68 nips-2013-Confidence Intervals and Hypothesis Testing for High-Dimensional Statistical Models
17 0.0371674 26 nips-2013-Adaptive Market Making via Online Learning
18 0.037011921 124 nips-2013-Forgetful Bayes and myopic planning: Human learning and decision-making in a bandit setting
19 0.03528611 338 nips-2013-Two-Target Algorithms for Infinite-Armed Bandits with Bernoulli Rewards
20 0.035069413 240 nips-2013-Optimization, Learning, and Games with Predictable Sequences
topicId topicWeight
[(0, 0.098), (1, -0.032), (2, 0.059), (3, -0.051), (4, 0.007), (5, -0.014), (6, 0.049), (7, -0.05), (8, -0.029), (9, 0.002), (10, 0.007), (11, 0.02), (12, -0.034), (13, -0.016), (14, -0.01), (15, -0.014), (16, -0.031), (17, 0.022), (18, -0.012), (19, -0.012), (20, 0.008), (21, -0.007), (22, -0.025), (23, -0.0), (24, -0.009), (25, -0.007), (26, -0.047), (27, -0.021), (28, -0.004), (29, -0.014), (30, -0.002), (31, 0.024), (32, 0.08), (33, -0.083), (34, 0.032), (35, 0.023), (36, -0.041), (37, -0.027), (38, -0.039), (39, -0.001), (40, -0.093), (41, -0.032), (42, -0.06), (43, 0.034), (44, -0.001), (45, 0.095), (46, 0.103), (47, -0.003), (48, 0.096), (49, 0.059)]
simIndex simValue paperId paperTitle
same-paper 1 0.86418343 112 nips-2013-Estimation Bias in Multi-Armed Bandit Algorithms for Search Advertising
Author: Min Xu, Tao Qin, Tie-Yan Liu
Abstract: In search advertising, the search engine needs to select the most profitable advertisements to display, which can be formulated as an instance of online learning with partial feedback, also known as the stochastic multi-armed bandit (MAB) problem. In this paper, we show that the naive application of MAB algorithms to search advertising for advertisement selection will produce sample selection bias that harms the search engine by decreasing expected revenue and “estimation of the largest mean” (ELM) bias that harms the advertisers by increasing game-theoretic player-regret. We then propose simple bias-correction methods with benefits to both the search engine and the advertisers. 1
2 0.69748384 159 nips-2013-Learning Prices for Repeated Auctions with Strategic Buyers
Author: Kareem Amin, Afshin Rostamizadeh, Umar Syed
Abstract: Inspired by real-time ad exchanges for online display advertising, we consider the problem of inferring a buyer’s value distribution for a good when the buyer is repeatedly interacting with a seller through a posted-price mechanism. We model the buyer as a strategic agent, whose goal is to maximize her long-term surplus, and we are interested in mechanisms that maximize the seller’s long-term revenue. We define the natural notion of strategic regret — the lost revenue as measured against a truthful (non-strategic) buyer. We present seller algorithms that are no(strategic)-regret when the buyer discounts her future surplus — i.e. the buyer prefers showing advertisements to users sooner rather than later. We also give a lower bound on strategic regret that increases as the buyer’s discounting weakens and shows, in particular, that any seller algorithm will suffer linear strategic regret if there is no discounting. 1
3 0.63119614 139 nips-2013-How to Hedge an Option Against an Adversary: Black-Scholes Pricing is Minimax Optimal
Author: Jacob Abernethy, Peter Bartlett, Rafael Frongillo, Andre Wibisono
Abstract: We consider a popular problem in finance, option pricing, through the lens of an online learning game between Nature and an Investor. In the Black-Scholes option pricing model from 1973, the Investor can continuously hedge the risk of an option by trading the underlying asset, assuming that the asset’s price fluctuates according to Geometric Brownian Motion (GBM). We consider a worst-case model, in which Nature chooses a sequence of price fluctuations under a cumulative quadratic volatility constraint, and the Investor can make a sequence of hedging decisions. Our main result is to show that the value of our proposed game, which is the “regret” of hedging strategy, converges to the Black-Scholes option price. We use significantly weaker assumptions than previous work—for instance, we allow large jumps in the asset price—and show that the Black-Scholes hedging strategy is near-optimal for the Investor even in this non-stochastic framework. 1
4 0.52084339 26 nips-2013-Adaptive Market Making via Online Learning
Author: Jacob Abernethy, Satyen Kale
Abstract: We consider the design of strategies for market making in an exchange. A market maker generally seeks to profit from the difference between the buy and sell price of an asset, yet the market maker also takes exposure risk in the event of large price movements. Profit guarantees for market making strategies have typically required certain stochastic assumptions on the price fluctuations of the asset in question; for example, assuming a model in which the price process is mean reverting. We propose a class of “spread-based” market making strategies whose performance can be controlled even under worst-case (adversarial) settings. We prove structural properties of these strategies which allows us to design a master algorithm which obtains low regret relative to the best such strategy in hindsight. We run a set of experiments showing favorable performance on recent real-world stock price data. 1
Author: Yuchen Zhang, John Duchi, Michael Jordan, Martin J. Wainwright
Abstract: We establish lower bounds on minimax risks for distributed statistical estimation under a communication budget. Such lower bounds reveal the minimum amount of communication required by any procedure to achieve the centralized minimax-optimal rates for statistical estimation. We study two classes of protocols: one in which machines send messages independently, and a second allowing for interactive communication. We establish lower bounds for several problems, including various types of location models, as well as for parameter estimation in regression models. 1
6 0.48734429 292 nips-2013-Sequential Transfer in Multi-armed Bandit with Finite Set of Models
7 0.45256665 280 nips-2013-Robust Data-Driven Dynamic Programming
8 0.43693513 95 nips-2013-Distributed Exploration in Multi-Armed Bandits
9 0.42506811 338 nips-2013-Two-Target Algorithms for Infinite-Armed Bandits with Bernoulli Rewards
10 0.41975173 253 nips-2013-Prior-free and prior-dependent regret bounds for Thompson Sampling
11 0.40963197 137 nips-2013-High-Dimensional Gaussian Process Bandits
12 0.40250009 80 nips-2013-Data-driven Distributionally Robust Polynomial Optimization
13 0.39713761 124 nips-2013-Forgetful Bayes and myopic planning: Human learning and decision-making in a bandit setting
14 0.38710076 325 nips-2013-The Pareto Regret Frontier
15 0.3861368 94 nips-2013-Distributed $k$-means and $k$-median Clustering on General Topologies
16 0.3763487 191 nips-2013-Minimax Optimal Algorithms for Unconstrained Linear Optimization
17 0.37411216 197 nips-2013-Moment-based Uniform Deviation Bounds for $k$-means and Friends
18 0.36659417 330 nips-2013-Thompson Sampling for 1-Dimensional Exponential Family Bandits
19 0.3555499 318 nips-2013-Structured Learning via Logistic Regression
20 0.35199732 228 nips-2013-Online Learning of Dynamic Parameters in Social Networks
topicId topicWeight
[(1, 0.443), (2, 0.018), (16, 0.03), (33, 0.099), (34, 0.066), (41, 0.014), (49, 0.015), (56, 0.125), (70, 0.017), (85, 0.032), (89, 0.032), (93, 0.015)]
simIndex simValue paperId paperTitle
same-paper 1 0.70390898 112 nips-2013-Estimation Bias in Multi-Armed Bandit Algorithms for Search Advertising
Author: Min Xu, Tao Qin, Tie-Yan Liu
Abstract: In search advertising, the search engine needs to select the most profitable advertisements to display, which can be formulated as an instance of online learning with partial feedback, also known as the stochastic multi-armed bandit (MAB) problem. In this paper, we show that the naive application of MAB algorithms to search advertising for advertisement selection will produce sample selection bias that harms the search engine by decreasing expected revenue and “estimation of the largest mean” (ELM) bias that harms the advertisers by increasing game-theoretic player-regret. We then propose simple bias-correction methods with benefits to both the search engine and the advertisers. 1
2 0.57157773 328 nips-2013-The Total Variation on Hypergraphs - Learning on Hypergraphs Revisited
Author: Matthias Hein, Simon Setzer, Leonardo Jost, Syama Sundar Rangapuram
Abstract: Hypergraphs allow one to encode higher-order relationships in data and are thus a very flexible modeling tool. Current learning methods are either based on approximations of the hypergraphs via graphs or on tensor methods which are only applicable under special conditions. In this paper, we present a new learning framework on hypergraphs which fully uses the hypergraph structure. The key element is a family of regularization functionals based on the total variation on hypergraphs. 1
3 0.56761634 324 nips-2013-The Fast Convergence of Incremental PCA
Author: Akshay Balsubramani, Sanjoy Dasgupta, Yoav Freund
Abstract: We consider a situation in which we see samples Xn ∈ Rd drawn i.i.d. from some distribution with mean zero and unknown covariance A. We wish to compute the top eigenvector of A in an incremental fashion - with an algorithm that maintains an estimate of the top eigenvector in O(d) space, and incrementally adjusts the estimate with each new data point that arrives. Two classical such schemes are due to Krasulina (1969) and Oja (1983). We give finite-sample convergence rates for both. 1
4 0.55111903 235 nips-2013-Online learning in episodic Markovian decision processes by relative entropy policy search
Author: Alexander Zimin, Gergely Neu
Abstract: We study the problem of online learning in finite episodic Markov decision processes (MDPs) where the loss function is allowed to change between episodes. The natural performance measure in this learning problem is the regret defined as the difference between the total loss of the best stationary policy and the total loss suffered by the learner. We assume that the learner is given access to a finite action space A and the state space X has a layered structure with L layers, so that state transitions are only possible between consecutive layers. We describe a variant of the recently proposed Relative Entropy Policy Search algorithm and show that its regret after T episodes is 2 L|X ||A|T log(|X ||A|/L) in the bandit setting and 2L T log(|X ||A|/L) in the full information setting, given that the learner has perfect knowledge of the transition probabilities of the underlying MDP. These guarantees largely improve previously known results under much milder assumptions and cannot be significantly improved under general assumptions. 1
5 0.45725769 333 nips-2013-Trading Computation for Communication: Distributed Stochastic Dual Coordinate Ascent
Author: Tianbao Yang
Abstract: We present and study a distributed optimization algorithm by employing a stochastic dual coordinate ascent method. Stochastic dual coordinate ascent methods enjoy strong theoretical guarantees and often have better performances than stochastic gradient descent methods in optimizing regularized loss minimization problems. It still lacks of efforts in studying them in a distributed framework. We make a progress along the line by presenting a distributed stochastic dual coordinate ascent algorithm in a star network, with an analysis of the tradeoff between computation and communication. We verify our analysis by experiments on real data sets. Moreover, we compare the proposed algorithm with distributed stochastic gradient descent methods and distributed alternating direction methods of multipliers for optimizing SVMs in the same distributed framework, and observe competitive performances. 1
6 0.44009179 159 nips-2013-Learning Prices for Repeated Auctions with Strategic Buyers
8 0.37925649 66 nips-2013-Computing the Stationary Distribution Locally
9 0.37852225 228 nips-2013-Online Learning of Dynamic Parameters in Social Networks
10 0.37723264 249 nips-2013-Polar Operators for Structured Sparse Estimation
11 0.37693372 125 nips-2013-From Bandits to Experts: A Tale of Domination and Independence
12 0.37663934 273 nips-2013-Reinforcement Learning in Robust Markov Decision Processes
13 0.37658983 230 nips-2013-Online Learning with Costly Features and Labels
14 0.3764126 355 nips-2013-Which Space Partitioning Tree to Use for Search?
15 0.37626788 224 nips-2013-On the Sample Complexity of Subspace Learning
16 0.37610838 252 nips-2013-Predictive PAC Learning and Process Decompositions
17 0.37471899 142 nips-2013-Information-theoretic lower bounds for distributed statistical estimation with communication constraints
18 0.37440088 68 nips-2013-Confidence Intervals and Hypothesis Testing for High-Dimensional Statistical Models
19 0.3743633 240 nips-2013-Optimization, Learning, and Games with Predictable Sequences
20 0.37371564 116 nips-2013-Fantope Projection and Selection: A near-optimal convex relaxation of sparse PCA