nips nips2010 nips2010-222 knowledge-graph by maker-knowledge-mining

222 nips-2010-Random Walk Approach to Regret Minimization


Source: pdf

Author: Hariharan Narayanan, Alexander Rakhlin

Abstract: We propose a computationally efficient random walk on a convex body which rapidly mixes to a time-varying Gibbs distribution. In the setting of online convex optimization and repeated games, the algorithm yields low regret and presents a novel efficient method for implementing mixture forecasting strategies. 1

Reference: text


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 edu Abstract We propose a computationally efficient random walk on a convex body which rapidly mixes to a time-varying Gibbs distribution. [sent-4, score-0.545]

2 In the setting of online convex optimization and repeated games, the algorithm yields low regret and presents a novel efficient method for implementing mixture forecasting strategies. [sent-5, score-0.597]

3 1 Introduction This paper brings together two topics: online convex optimization and sampling from logconcave distributions over convex bodies. [sent-6, score-0.656]

4 Online convex optimization has been a recent focus of research [30, 25], for it presents an abstraction that unifies and generalizes a number of existing results in online learning. [sent-7, score-0.224]

5 Techniques from the theory of optimization (in particular, Fenchel and minimax duality) have proven to be key for understanding the rates of growth of regret [25, 1]. [sent-8, score-0.423]

6 Deterministic regularization methods [3, 25] have emerged as natural black-box algorithms for regret minimization, and the choice of the regularization function turned out to play a pivotal role in limited-feedback problems [3]. [sent-9, score-0.339]

7 The latter gives a handle on the local geometry of the convex set, crucial for linear optimization with limited feedback. [sent-11, score-0.188]

8 Random walks in a convex body gained much attention following the breakthrough paper of Dyer, Frieze and Kannan [9], who exhibited a polynomial time randomized algorithm for estimating the volume of a convex body. [sent-12, score-0.534]

9 Over the two decades following [9], the polynomial dependence of volume computation on the dimension n has been drastically decreased from O∗ (n23 ) to O∗ (n4 ) [17]. [sent-14, score-0.08]

10 The driving force behind such results are the isoperimetric inequalities which can be extended from uniform to general logconcave distributions. [sent-16, score-0.325]

11 In particular, computing the volume of a convex body can be seen as a special case of integration of a logconcave function, and there has been a number of major results on mixing time for sampling from logconcave distributions [17, 18]. [sent-17, score-0.797]

12 By exploiting the local geometry of the set, this random walk is shown to mix rapidly, and offers a number of advantages over the other random walks. [sent-20, score-0.325]

13 While the aim of online convex optimization is different from that of sampling from logconcave distributions, the fact that the two communities recognized the importance of the Dikin ellipsoid is remarkable. [sent-21, score-0.516]

14 We show that the problem of online convex optimization can be solved by sampling from logconcave distributions, and that the Dikin Walk can be adapted to mix rapidly to a certain time-varying distribution. [sent-23, score-0.511]

15 In fact, it mixes fast enough that for linear cost functions only one step of the guided Dikin Walk is necessary per round of the repeated game. [sent-24, score-0.195]

16 From the Bayesian point of view, our algorithm is an efficient procedure for sampling from posterior distributions, and can be used for settings outside of regret minimization. [sent-29, score-0.399]

17 In [2], it was shown that the Weighted Average Forecaster [15, 27] on a prohibitively large class of experts is optimal in terms of regret for a certain multitask problem, yet computationally inefficient. [sent-33, score-0.339]

18 A Markov chain has been proposed with the required stationary distribution, but no mixing time bounds have been derived. [sent-34, score-0.255]

19 In [8], the authors faced a similar problem whereby a near-optimal regret can be achieved by the Weighted Average Forecaster on a prohibitively large discretization of the set of decisions. [sent-35, score-0.376]

20 Beyond [11], we are not aware of any results to date where a provably rapidly mixing walk is used to solve regret minimization problems. [sent-37, score-0.762]

21 It is worth emphasizing that without the Dikin Walk [19], the one-step mixing results of this paper seem out of reach. [sent-38, score-0.101]

22 In particular, when sampling from exponential distributions, the known bounds for√ conductance of the Ball Walk and Hit-and-Run are not scale-independent. [sent-39, score-0.347]

23 As a consequence of the deterioration of the bounds on the conductance as the scale tends to zero, the number of steps necessary per round would tend to infinity as T tends to infinity. [sent-41, score-0.325]

24 2 Main Results Let K ⊂ Rn be a convex compact set and let F be a set of convex functions from K to R. [sent-42, score-0.295]

25 Online convex optimization is defined as a repeated T -round game between the player (the algorithm) and Nature (adversary) [30, 25]. [sent-43, score-0.217]

26 We are interested in randomized algorithms, and hence we consider the following online learning model: on round t, the player chooses a distribution (or, a mixed strategy) µt−1 supported on K and “plays” a random Xt ∼ µt−1 . [sent-50, score-0.246]

27 The goal of the player is to control expected regret (see Lemma 1) with respect to a randomized strategy defined by a fixed distribution pU ∈ P for some collection of distributions P. [sent-52, score-0.557]

28 If P contains Dirac delta distributions, the comparator term is indeed the best fixed decision x∗ ∈ K chosen in hindsight. [sent-53, score-0.079]

29 A procedure which guarantees sublinear growth of regret for any distribution pU ∈ P will be called Hannan consistent with respect to P. [sent-54, score-0.383]

30 Denote the cumulative cost functions by Lt (x) = s=1 s (x), with L0 (x) ≡ 0, and let η > 0 be a learning rate. [sent-57, score-0.116]

31 Define the following sequence of functions qt (x) = q0 (x) exp {−ηLt (x)} , ∀t ∈ {1, . [sent-59, score-0.142]

32 Define the probability distribution µt over K at time t to have density q0 (x)e−ηLt (x) dµt (x) = dx Zt where Zt = qt (x)dx. [sent-63, score-0.24]

33 (2) x∈K Let D(p||q) stand for the Kullback-Leibler divergence between distributions p and q. [sent-64, score-0.128]

34 The following lemma1 gives an equality for expected regret with respect to a fixed randomized strategy. [sent-65, score-0.396]

35 It bears 1 Due to its simplicity, the lemma has likely appeared in the literature, yet we could not locate a reference for this form with equality and in the context of online convex optimization. [sent-66, score-0.358]

36 2 striking similarity to upper bounds on regret in terms of Bregman divergences for the Follow the Regularized Leader and Mirror Descent methods [23, 5], [7, Therem 11. [sent-69, score-0.375]

37 The expected regret is T T t (Xt ) E t=1 − T t (U ) t=1 = η −1 (D(pU ||µ0 ) − D(pU ||µT )) + η −1 t=1 D(µt−1 ||µt ). [sent-77, score-0.339]

38 First, if the divergence between the comparator √ distribution pU and the prior µ0 is bounded, the result yields O( T ) rates of regret growth for bounded losses by choosing η appropriately. [sent-80, score-0.583]

39 326] and gives a near-optimal O( T log T ) regret bound. [sent-83, score-0.339]

40 We also note that for linear cost functions, the notion of expected regret coincides with regret for deterministic strategies. [sent-84, score-0.719]

41 Third, we note that if the prior is of the form q0 (x) ∝ exp{−R(x)} for some convex function R, then qt (x) ∝ exp {− (ηLt (x) + R(x))}, bearing similarity to the objective function of the Follow the Regularized Leader algorithm [23, 3]. [sent-85, score-0.27]

42 For instance, if the cost functions are linear and the set K is a convex hull of N vertices (e. [sent-87, score-0.207]

43 Finally, we note that in online convex optimization, one of the difficulties is the issue of projections back to the set K. [sent-90, score-0.224]

44 The main result of this paper is that for linear Lipschitz cost functions the guided random walk (Algorithm 1 below) produces a sequence of points X1 , . [sent-96, score-0.342]

45 The step requires sampling from a Gaussian distribution with covariance given by the Hessian of the self-concordant barrier and can be implemented efficiently whenever the Hessian can be computed. [sent-104, score-0.125]

46 First, the analysis of the random walk is carried out only for linear cost functions with a bounded Lipschitz constant. [sent-107, score-0.373]

47 An analysis for general convex functions might be possible, but for the sake of brevity we restrict ourselves to the linear case. [sent-108, score-0.128]

48 Note that convex cost functions can be linearized and a standard argument shows that regret for linearized functions can only be larger than that for the convex functions [30]. [sent-109, score-0.712]

49 The second assumption is that q0 does not depend on T and has a bounded L2 norm with respect to the uniform distribution on K. [sent-110, score-0.102]

50 Suppose t : K → [0, 1] are linear functions with Lipschitz constant 1 and the prior q0 is of bounded L2 norm with respect to uniform distribution on K. [sent-113, score-0.102]

51 Then the one-step random walk (Algorithm 1) produces a sequence X1 , . [sent-114, score-0.265]

52 Therefore, regret of Algorithm 1 is within O( T ) from the ideal procedure of Lemma 1. [sent-121, score-0.339]

53 The statement follows directly from Lemma 1, Theorem 9, and an observation that for bounded losses Eµt−1 t (Xt ) − Eσt−1 t (Xt ) ≤ 3 x∈K | t (x)| · |dµt−1 (x) − dσt−1 (x)| ≤ Cηn3 ν 2 . [sent-124, score-0.124]

54 Sampling from a time-varying Gibbs distribution Sketch of the Analysis The sufficiency of only one step of the random walk is made possible by the fact that the distributions µt−1 and µt are close, and thus µt−1 is a (very) warm start for µt . [sent-125, score-0.339]

55 The reduction in distance between the distributions after a single step is due to a general fact (Lov´ sz-Simonovits [16]) which we state in Theorem 6. [sent-126, score-0.107]

56 The majority of the work goes into lower a bounding the conductance of the random walk by a quantity independent of T (Lemma 5). [sent-127, score-0.516]

57 1, the main building blocks for proving mixing time are stated, and their proofs appear later in Section 4. [sent-131, score-0.141]

58 For any function F on the interior int(K) having continuous derivatives of order k, for vectors h1 , . [sent-137, score-0.079]

59 For x, y ∈ K, ρ(x, y) is the distance in the Riemannian metric whose metric tensor is the Hessian of F . [sent-151, score-0.18]

60 Thus, the metric tensor on the tangent space at x assigns to a vector v the length v x := D2 F (x)[v, v], and to a pair of vectors v, w, the inner product v, w x := D2 F (x)[v, w]. [sent-152, score-0.09]

61 Let M be the metric space whose point set is K and metric is ρ. [sent-154, score-0.18]

62 For x ∈ int(K), let Gx denote the unique Gaussian probability density function on Rn such that n x−y 2 1 x Gx (y) ∝ exp − + V (x) , where V (x) = ln det D2 F (x) and r = 1/(Cn) r2 2 Further, define the scaled cumulative cost as st (y) := r2 ηLt (y). [sent-156, score-0.199]

63 Therefore the Markov chain is reversible and has this stationary measure. [sent-161, score-0.159]

64 The next results imply that this Markov chain is rapidly mixing. [sent-162, score-0.133]

65 The first main ingredient is an isoperimetric inequality necessary for lower bounding conductance. [sent-163, score-0.121]

66 Let S1 and S2 be measurable subsets of K and µ a probability measure supported on K that possesses a density whose logarithm is concave. [sent-165, score-0.201]

67 2(1 + 3ν) 4 Algorithm 1 One Step Random Walk (Xt , st ) Input: current point Xt ∈ K and scaled cumulative cost st . [sent-167, score-0.159]

68 The next Lemma relates the Riemannian metric ρ to the Markov Chain. [sent-176, score-0.09]

69 Intuitively, it says that for close-by points, their transition distributions cannot be far apart. [sent-177, score-0.11]

70 Theorem 3 and Lemma 4 together give a lower bound on conductance of the Markov Chain. [sent-180, score-0.251]

71 The conductance Φ := S1 inf µ(S1 )≤ 1 2 Px (K \ S1 )dµ(x) µ(S1 ) of the Markov Chain in Algorithm 1 is bounded below by 1√ . [sent-183, score-0.385]

72 Cνn n The lower bound on conductance of Lemma 5 can now be used with the following general result on the reduction of distance between distributions. [sent-184, score-0.284]

73 Let γ0 be the initial distribution for a lazy reversible ergodic a Markov chain whose conductance is Φ and stationary measure is γ, and γk be the distribution of the k th step. [sent-186, score-0.41]

74 Let M := supS γ0 (S) where the supremum is over all measurable subsets S of K. [sent-187, score-0.119]

75 For γ(S) every bounded f , let f x to K 2,γ denote f (y)dPx (y). [sent-188, score-0.106]

76 In summary, Lemma 5 provides a lower bound on conductance, while Theorem 6 ensures reduction of the norm whenever conductance is large enough. [sent-191, score-0.319]

77 We will show that reduction in the norm guarantees that the distribution after one step of the random walk (k = 1 in Theorem 6) is close to the desired distribution µt . [sent-193, score-0.333]

78 2 Tracking the distributions ∞ Let {σi }i=1 be the probability measures with bounded density, supported on K, corresponding to the distribution of a point during different steps of the evolution of the algorithm. [sent-195, score-0.141]

79 Hence, for a measurable function f : K → R, f i = 1/2 f 2 dµi K −ηLi (x) . [sent-198, score-0.085]

80 Furthermore, dµi (x) q0 (x)e dx Zt+1 = sup ≤ e2η ≤ 1 + η ¯ −ηLi+1 (x) dx Z t x∈K dµi+1 (x) x∈K q0 (x)e sup (4) where we used the fact that i+1 (x) ≤ 1 and η is an appropriate multiple of η, e. [sent-199, score-0.112]

81 It then follows that the norms at time i and ¯ i + 1 are comparable: f i (1 + η )−1 ≤ f ¯ 5 i+1 ≤ f i (1 + η ) ¯ (5) The mixing results of Lemma 5 together with Theorem 6 imply Corollary 7. [sent-203, score-0.101]

82 i Thus, (6) is bounded as dσi+1 −1 dµi+1 i+1 − dσi+1 −1 dµi i+1 ≤ η (1 + η ) ¯ ¯ dσi+1 dµi = η (1 + η ) 1 + ¯ ¯ i Next, a bound on (7) follows simply by the norm comparison inequality (5): dσi+1 −1 dµi i+1 − dσi+1 −1 dµi i ≤η ¯ dσi+1 −1 dµi . [sent-215, score-0.102]

83 i The statement follows by rearranging the terms. [sent-216, score-0.098]

84 From this and (8), the first statement of the theorem follows. [sent-228, score-0.122]

85 We term (S1 , (M \ S1 ) \ S2 , S2 ) a δ-partition of M, if δ ≤ dM (S1 , S2 ) := inf x∈S1 ,y∈S2 dM (x, y), where S1 , S2 are measurable subsets of M. [sent-241, score-0.186]

86 If µ is a measure on M, the isoperimetric constant is defined as C(δ, M, µ) := inf Pδ µ((M \ S1 ) \ S2 ) and Ct := C µ(S1 )µ(S2 ) r √ , M, µt . [sent-243, score-0.188]

87 n Given interior points x, y in int(K), suppose p, q are the ends of the chord in K containing x, y and p, x, y, q lie in that order. [sent-244, score-0.079]

88 Let dH denote the |p−x||q−y| Hilbert (projective) metric defined by dH (x, y) := ln (1 + σ(x, y)) . [sent-246, score-0.09]

89 For two sets S1 and S2 , let σ(S1 , S2 ) := inf x∈S1 ,y∈S2 σ(x, y). [sent-247, score-0.106]

90 y→x |x − y|x |x − y|x Hence, using Theorem 10, the Hilbert metric and the Riemannian metric satisfy ρ(x, y) ≤ 2(1 + 3ν)dH (x, y). [sent-255, score-0.18]

91 Let S1 be a measurable subset of K such that µ(S1 ) ≤ 1 and S2 := K \ S1 be 2 its complement. [sent-258, score-0.085]

92 We have that qt−1 Zt Zt D(µt−1 ||µt ) = dµt−1 log = log + Zt−1 qt Zt−1 K η t (x)dµt−1 (x) = log K Zt + ηE t (Xt ). [sent-270, score-0.142]

93 (11), the KL divergence can be also written as qt−1 (x)dx + ηE t (Xt ) = log Ee−η( t (Xt )−E t (Xt )) qt−1 (x)dx K By representing the divergence in this form, one can obtain upper bounds via known methods, such as Log-Sobolev inequalities (e. [sent-275, score-0.178]

94 A stochastic view of optimal regret through minimax duality. [sent-287, score-0.379]

95 Mirror descent and nonlinear projected subgradient methods for convex optimization. [sent-312, score-0.128]

96 A random polynomial-time algorithm for approximating the volume of convex bodies. [sent-340, score-0.173]

97 Random walks on polytopes and an affine interior point method for linear programming. [sent-361, score-0.171]

98 Random walks in a convex body and an improved volume algorithm. [sent-380, score-0.314]

99 Simulated annealing in convex bodies and an o∗ (n4 ) volume algorithm. [sent-385, score-0.213]

100 Interior point polynomial time methods in convex programming, 2004. [sent-405, score-0.163]


similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('regret', 0.339), ('walk', 0.265), ('xt', 0.254), ('conductance', 0.251), ('dikin', 0.241), ('pu', 0.218), ('logconcave', 0.17), ('zt', 0.165), ('qt', 0.142), ('lemma', 0.134), ('convex', 0.128), ('px', 0.126), ('forecaster', 0.121), ('isoperimetric', 0.121), ('dh', 0.116), ('lov', 0.107), ('riemannian', 0.107), ('mixing', 0.101), ('lt', 0.099), ('online', 0.096), ('walks', 0.092), ('py', 0.092), ('dpu', 0.09), ('metric', 0.09), ('measurable', 0.085), ('int', 0.085), ('interior', 0.079), ('comparator', 0.079), ('chain', 0.076), ('sz', 0.074), ('distributions', 0.074), ('hk', 0.07), ('inf', 0.067), ('bounded', 0.067), ('theorem', 0.065), ('barrier', 0.065), ('abernethy', 0.065), ('gx', 0.065), ('ellipsoid', 0.062), ('geometry', 0.06), ('gxt', 0.06), ('sampling', 0.06), ('markov', 0.058), ('rapidly', 0.057), ('randomized', 0.057), ('statement', 0.057), ('dx', 0.056), ('player', 0.055), ('divergence', 0.054), ('leader', 0.053), ('lipschitz', 0.051), ('hessian', 0.051), ('nesterov', 0.05), ('body', 0.049), ('hannan', 0.049), ('dpx', 0.049), ('dyer', 0.046), ('mixes', 0.046), ('frieze', 0.046), ('volume', 0.045), ('growth', 0.044), ('rakhlin', 0.043), ('stationary', 0.042), ('dk', 0.042), ('density', 0.042), ('cost', 0.041), ('reversible', 0.041), ('rearranging', 0.041), ('st', 0.041), ('minimax', 0.04), ('proving', 0.04), ('dt', 0.04), ('possesses', 0.04), ('annealing', 0.04), ('kalai', 0.04), ('fenchel', 0.04), ('let', 0.039), ('linearized', 0.038), ('kannan', 0.038), ('round', 0.038), ('vertices', 0.038), ('universal', 0.037), ('discretization', 0.037), ('bandit', 0.037), ('cumulative', 0.036), ('guided', 0.036), ('says', 0.036), ('bounds', 0.036), ('mirror', 0.035), ('ball', 0.035), ('norm', 0.035), ('polynomial', 0.035), ('inequalities', 0.034), ('subsets', 0.034), ('repeated', 0.034), ('lim', 0.033), ('reduction', 0.033), ('gibbs', 0.032), ('strategy', 0.032), ('dm', 0.032)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.99999994 222 nips-2010-Random Walk Approach to Regret Minimization

Author: Hariharan Narayanan, Alexander Rakhlin

Abstract: We propose a computationally efficient random walk on a convex body which rapidly mixes to a time-varying Gibbs distribution. In the setting of online convex optimization and repeated games, the algorithm yields low regret and presents a novel efficient method for implementing mixture forecasting strategies. 1

2 0.26362532 196 nips-2010-Online Markov Decision Processes under Bandit Feedback

Author: Gergely Neu, Andras Antos, András György, Csaba Szepesvári

Abstract: We consider online learning in finite stochastic Markovian environments where in each time step a new reward function is chosen by an oblivious adversary. The goal of the learning agent is to compete with the best stationary policy in terms of the total reward received. In each time step the agent observes the current state and the reward associated with the last transition, however, the agent does not observe the rewards associated with other state-action pairs. The agent is assumed to know the transition probabilities. The state of the art result for this setting is a no-regret algorithm. In this paper we propose a new learning algorithm and, assuming that stationary policies mix uniformly fast, we show that after T time steps, the expected regret of the new algorithm is O T 2/3 (ln T )1/3 , giving the first rigorously proved regret bound for the problem. 1

3 0.23805435 193 nips-2010-Online Learning: Random Averages, Combinatorial Parameters, and Learnability

Author: Alexander Rakhlin, Karthik Sridharan, Ambuj Tewari

Abstract: We develop a theory of online learning by defining several complexity measures. Among them are analogues of Rademacher complexity, covering numbers and fatshattering dimension from statistical learning theory. Relationship among these complexity measures, their connection to online learning, and tools for bounding them are provided. We apply these results to various learning problems. We provide a complete characterization of online learnability in the supervised setting. 1

4 0.16731836 192 nips-2010-Online Classification with Specificity Constraints

Author: Andrey Bernstein, Shie Mannor, Nahum Shimkin

Abstract: We consider the online binary classification problem, where we are given m classifiers. At each stage, the classifiers map the input to the probability that the input belongs to the positive class. An online classification meta-algorithm is an algorithm that combines the outputs of the classifiers in order to attain a certain goal, without having prior knowledge on the form and statistics of the input, and without prior knowledge on the performance of the given classifiers. In this paper, we use sensitivity and specificity as the performance metrics of the meta-algorithm. In particular, our goal is to design an algorithm that satisfies the following two properties (asymptotically): (i) its average false positive rate (fp-rate) is under some given threshold; and (ii) its average true positive rate (tp-rate) is not worse than the tp-rate of the best convex combination of the m given classifiers that satisfies fprate constraint, in hindsight. We show that this problem is in fact a special case of the regret minimization problem with constraints, and therefore the above goal is not attainable. Hence, we pose a relaxed goal and propose a corresponding practical online learning meta-algorithm that attains it. In the case of two classifiers, we show that this algorithm takes a very simple form. To our best knowledge, this is the first algorithm that addresses the problem of the average tp-rate maximization under average fp-rate constraints in the online setting. 1

5 0.15314342 278 nips-2010-Universal Consistency of Multi-Class Support Vector Classification

Author: Tobias Glasmachers

Abstract: Steinwart was the first to prove universal consistency of support vector machine classification. His proof analyzed the ‘standard’ support vector machine classifier, which is restricted to binary classification problems. In contrast, recent analysis has resulted in the common belief that several extensions of SVM classification to more than two classes are inconsistent. Countering this belief, we prove the universal consistency of the multi-class support vector machine by Crammer and Singer. Our proof extends Steinwart’s techniques to the multi-class case. 1

6 0.14456269 182 nips-2010-New Adaptive Algorithms for Online Classification

7 0.13533247 203 nips-2010-Parametric Bandits: The Generalized Linear Case

8 0.13270809 183 nips-2010-Non-Stochastic Bandit Slate Problems

9 0.12924749 243 nips-2010-Smoothness, Low Noise and Fast Rates

10 0.11767657 219 nips-2010-Random Conic Pursuit for Semidefinite Programming

11 0.1167411 265 nips-2010-The LASSO risk: asymptotic results and real world examples

12 0.11200998 49 nips-2010-Computing Marginal Distributions over Continuous Markov Networks for Statistical Relational Learning

13 0.10900967 232 nips-2010-Sample Complexity of Testing the Manifold Hypothesis

14 0.10620937 134 nips-2010-LSTD with Random Projections

15 0.10102394 122 nips-2010-Improving the Asymptotic Performance of Markov Chain Monte-Carlo by Inserting Vortices

16 0.098179914 226 nips-2010-Repeated Games against Budgeted Adversaries

17 0.097274952 171 nips-2010-Movement extraction by detecting dynamics switches and repetitions

18 0.09454824 195 nips-2010-Online Learning in The Manifold of Low-Rank Matrices

19 0.093793571 201 nips-2010-PAC-Bayesian Model Selection for Reinforcement Learning

20 0.092387982 270 nips-2010-Tight Sample Complexity of Large-Margin Learning


similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.251), (1, -0.076), (2, 0.189), (3, 0.061), (4, 0.058), (5, 0.087), (6, -0.066), (7, -0.03), (8, -0.071), (9, 0.302), (10, 0.109), (11, -0.038), (12, -0.031), (13, -0.044), (14, -0.041), (15, 0.11), (16, 0.018), (17, -0.082), (18, 0.087), (19, 0.01), (20, 0.028), (21, 0.064), (22, -0.02), (23, 0.042), (24, -0.158), (25, -0.005), (26, 0.073), (27, -0.082), (28, -0.118), (29, 0.007), (30, -0.015), (31, -0.067), (32, -0.031), (33, -0.08), (34, -0.021), (35, -0.091), (36, -0.117), (37, -0.027), (38, -0.109), (39, 0.081), (40, 0.073), (41, 0.051), (42, -0.071), (43, 0.025), (44, 0.083), (45, -0.017), (46, 0.095), (47, -0.012), (48, 0.122), (49, -0.015)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.94738555 222 nips-2010-Random Walk Approach to Regret Minimization

Author: Hariharan Narayanan, Alexander Rakhlin

Abstract: We propose a computationally efficient random walk on a convex body which rapidly mixes to a time-varying Gibbs distribution. In the setting of online convex optimization and repeated games, the algorithm yields low regret and presents a novel efficient method for implementing mixture forecasting strategies. 1

2 0.7490319 183 nips-2010-Non-Stochastic Bandit Slate Problems

Author: Satyen Kale, Lev Reyzin, Robert E. Schapire

Abstract: We consider bandit problems, motivated by applications in online advertising and news story selection, in which the learner must repeatedly select a slate, that is, a subset of size s from K possible actions, and then receives rewards for just the selected actions. The goal is to minimize the regret with respect to total reward of the best slate computed in hindsight. We consider unordered and ordered versions √ of the problem, and give efficient algorithms which have regret O( T ), where the constant depends on the specific nature of the problem. We also consider versions of the problem where we have access to a number of policies which make recom√ mendations for slates in every round, and give algorithms with O( T ) regret for competing with the best such policy as well. We make use of the technique of relative entropy projections combined with the usual multiplicative weight update algorithm to obtain our algorithms. 1

3 0.74296838 193 nips-2010-Online Learning: Random Averages, Combinatorial Parameters, and Learnability

Author: Alexander Rakhlin, Karthik Sridharan, Ambuj Tewari

Abstract: We develop a theory of online learning by defining several complexity measures. Among them are analogues of Rademacher complexity, covering numbers and fatshattering dimension from statistical learning theory. Relationship among these complexity measures, their connection to online learning, and tools for bounding them are provided. We apply these results to various learning problems. We provide a complete characterization of online learnability in the supervised setting. 1

4 0.67061746 196 nips-2010-Online Markov Decision Processes under Bandit Feedback

Author: Gergely Neu, Andras Antos, András György, Csaba Szepesvári

Abstract: We consider online learning in finite stochastic Markovian environments where in each time step a new reward function is chosen by an oblivious adversary. The goal of the learning agent is to compete with the best stationary policy in terms of the total reward received. In each time step the agent observes the current state and the reward associated with the last transition, however, the agent does not observe the rewards associated with other state-action pairs. The agent is assumed to know the transition probabilities. The state of the art result for this setting is a no-regret algorithm. In this paper we propose a new learning algorithm and, assuming that stationary policies mix uniformly fast, we show that after T time steps, the expected regret of the new algorithm is O T 2/3 (ln T )1/3 , giving the first rigorously proved regret bound for the problem. 1

5 0.64006889 219 nips-2010-Random Conic Pursuit for Semidefinite Programming

Author: Ariel Kleiner, Ali Rahimi, Michael I. Jordan

Abstract: We present a novel algorithm, Random Conic Pursuit, that solves semidefinite programs (SDPs) via repeated optimization over randomly selected two-dimensional subcones of the PSD cone. This scheme is simple, easily implemented, applicable to very general SDPs, scalable, and theoretically interesting. Its advantages are realized at the expense of an ability to readily compute highly exact solutions, though useful approximate solutions are easily obtained. This property renders Random Conic Pursuit of particular interest for machine learning applications, in which the relevant SDPs are generally based upon random data and so exact minima are often not a priority. Indeed, we present empirical results to this effect for various SDPs encountered in machine learning; these experiments demonstrate the potential practical usefulness of Random Conic Pursuit. We also provide a preliminary analysis that yields insight into the theoretical properties and convergence of the algorithm. 1

6 0.59693074 203 nips-2010-Parametric Bandits: The Generalized Linear Case

7 0.54037625 182 nips-2010-New Adaptive Algorithms for Online Classification

8 0.5359652 202 nips-2010-Parallelized Stochastic Gradient Descent

9 0.52560103 192 nips-2010-Online Classification with Specificity Constraints

10 0.51771122 122 nips-2010-Improving the Asymptotic Performance of Markov Chain Monte-Carlo by Inserting Vortices

11 0.49516004 226 nips-2010-Repeated Games against Budgeted Adversaries

12 0.46975607 265 nips-2010-The LASSO risk: asymptotic results and real world examples

13 0.4677754 201 nips-2010-PAC-Bayesian Model Selection for Reinforcement Learning

14 0.44677287 232 nips-2010-Sample Complexity of Testing the Manifold Hypothesis

15 0.43806073 49 nips-2010-Computing Marginal Distributions over Continuous Markov Networks for Statistical Relational Learning

16 0.42397586 243 nips-2010-Smoothness, Low Noise and Fast Rates

17 0.41644076 134 nips-2010-LSTD with Random Projections

18 0.41554892 278 nips-2010-Universal Consistency of Multi-Class Support Vector Classification

19 0.40147814 220 nips-2010-Random Projection Trees Revisited

20 0.39994442 270 nips-2010-Tight Sample Complexity of Large-Margin Learning


similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(13, 0.086), (16, 0.022), (17, 0.016), (27, 0.039), (30, 0.101), (35, 0.016), (39, 0.013), (45, 0.149), (50, 0.046), (52, 0.022), (53, 0.018), (60, 0.066), (77, 0.051), (78, 0.224), (90, 0.05), (98, 0.013)]

similar papers list:

simIndex simValue paperId paperTitle

1 0.87850451 74 nips-2010-Empirical Bernstein Inequalities for U-Statistics

Author: Thomas Peel, Sandrine Anthoine, Liva Ralaivola

Abstract: We present original empirical Bernstein inequalities for U-statistics with bounded symmetric kernels q. They are expressed with respect to empirical estimates of either the variance of q or the conditional variance that appears in the Bernsteintype inequality for U-statistics derived by Arcones [2]. Our result subsumes other existing empirical Bernstein inequalities, as it reduces to them when U-statistics of order 1 are considered. In addition, it is based on a rather direct argument using two applications of the same (non-empirical) Bernstein inequality for U-statistics. We discuss potential applications of our new inequalities, especially in the realm of learning ranking/scoring functions. In the process, we exhibit an efficient procedure to compute the variance estimates for the special case of bipartite ranking that rests on a sorting argument. We also argue that our results may provide test set bounds and particularly interesting empirical racing algorithms for the problem of online learning of scoring functions. 1

same-paper 2 0.83023429 222 nips-2010-Random Walk Approach to Regret Minimization

Author: Hariharan Narayanan, Alexander Rakhlin

Abstract: We propose a computationally efficient random walk on a convex body which rapidly mixes to a time-varying Gibbs distribution. In the setting of online convex optimization and repeated games, the algorithm yields low regret and presents a novel efficient method for implementing mixture forecasting strategies. 1

3 0.82575572 154 nips-2010-Learning sparse dynamic linear systems using stable spline kernels and exponential hyperpriors

Author: Alessandro Chiuso, Gianluigi Pillonetto

Abstract: We introduce a new Bayesian nonparametric approach to identification of sparse dynamic linear systems. The impulse responses are modeled as Gaussian processes whose autocovariances encode the BIBO stability constraint, as defined by the recently introduced “Stable Spline kernel”. Sparse solutions are obtained by placing exponential hyperpriors on the scale factors of such kernels. Numerical experiments regarding estimation of ARMAX models show that this technique provides a definite advantage over a group LAR algorithm and state-of-the-art parametric identification techniques based on prediction error minimization. 1

4 0.82175946 151 nips-2010-Learning from Candidate Labeling Sets

Author: Jie Luo, Francesco Orabona

Abstract: In many real world applications we do not have access to fully-labeled training data, but only to a list of possible labels. This is the case, e.g., when learning visual classifiers from images downloaded from the web, using just their text captions or tags as learning oracles. In general, these problems can be very difficult. However most of the time there exist different implicit sources of information, coming from the relations between instances and labels, which are usually dismissed. In this paper, we propose a semi-supervised framework to model this kind of problems. Each training sample is a bag containing multi-instances, associated with a set of candidate labeling vectors. Each labeling vector encodes the possible labels for the instances in the bag, with only one being fully correct. The use of the labeling vectors provides a principled way not to exclude any information. We propose a large margin discriminative formulation, and an efficient algorithm to solve it. Experiments conducted on artificial datasets and a real-world images and captions dataset show that our approach achieves performance comparable to an SVM trained with the ground-truth labels, and outperforms other baselines.

5 0.78612733 112 nips-2010-Hashing Hyperplane Queries to Near Points with Applications to Large-Scale Active Learning

Author: Prateek Jain, Sudheendra Vijayanarasimhan, Kristen Grauman

Abstract: We consider the problem of retrieving the database points nearest to a given hyperplane query without exhaustively scanning the database. We propose two hashingbased solutions. Our first approach maps the data to two-bit binary keys that are locality-sensitive for the angle between the hyperplane normal and a database point. Our second approach embeds the data into a vector space where the Euclidean norm reflects the desired distance between the original points and hyperplane query. Both use hashing to retrieve near points in sub-linear time. Our first method’s preprocessing stage is more efficient, while the second has stronger accuracy guarantees. We apply both to pool-based active learning: taking the current hyperplane classifier as a query, our algorithm identifies those points (approximately) satisfying the well-known minimal distance-to-hyperplane selection criterion. We empirically demonstrate our methods’ tradeoffs, and show that they make it practical to perform active selection with millions of unlabeled points. 1

6 0.73178726 132 nips-2010-Joint Cascade Optimization Using A Product Of Boosted Classifiers

7 0.72342026 36 nips-2010-Avoiding False Positive in Multi-Instance Learning

8 0.71642733 270 nips-2010-Tight Sample Complexity of Large-Margin Learning

9 0.7056154 195 nips-2010-Online Learning in The Manifold of Low-Rank Matrices

10 0.6971032 193 nips-2010-Online Learning: Random Averages, Combinatorial Parameters, and Learnability

11 0.69242346 243 nips-2010-Smoothness, Low Noise and Fast Rates

12 0.68939412 261 nips-2010-Supervised Clustering

13 0.68891555 47 nips-2010-Co-regularization Based Semi-supervised Domain Adaptation

14 0.68749011 182 nips-2010-New Adaptive Algorithms for Online Classification

15 0.68463302 27 nips-2010-Agnostic Active Learning Without Constraints

16 0.6834572 52 nips-2010-Convex Multiple-Instance Learning by Estimating Likelihood Ratio

17 0.68322212 163 nips-2010-Lower Bounds on Rate of Convergence of Cutting Plane Methods

18 0.68272507 277 nips-2010-Two-Layer Generalization Analysis for Ranking Using Rademacher Average

19 0.68266064 265 nips-2010-The LASSO risk: asymptotic results and real world examples

20 0.68211794 25 nips-2010-Active Learning by Querying Informative and Representative Examples