nips nips2012 nips2012-102 knowledge-graph by maker-knowledge-mining

102 nips-2012-Distributed Non-Stochastic Experts


Source: pdf

Author: Varun Kanade, Zhenming Liu, Bozidar Radunovic

Abstract: We consider the online distributed non-stochastic experts problem, where the distributed system consists of one coordinator node that is connected to k sites, and the sites are required to communicate with each other via the coordinator. At each time-step t, one of the k site nodes has to pick an expert from the set {1, . . . , n}, and the same site receives information about payoffs of all experts for that round. The goal of the distributed system is to minimize regret at time horizon T , while simultaneously keeping communication to a minimum. The two extreme solutions to this problem are: (i) Full communication: This essentially simulates the nondistributed setting to obtain the optimal O( log(n)T ) regret bound at the cost of T communication. (ii) No communication: Each site runs an independent copy – the regret is O( log(n)kT ) and the communication is 0. This paper shows the √ difficulty of simultaneously achieving regret asymptotically better than kT and communication better than T . We give a novel algorithm that for an oblivious √ adversary achieves a non-trivial trade-off: regret O( k 5(1+ )/6 T ) and communication O(T /k ), for any value of ∈ (0, 1/5). We also consider a variant of the model, where the coordinator picks the expert. In this model, we show that the label-efficient forecaster of Cesa-Bianchi et al. (2005) already gives us strategy that is near optimal in regret vs communication trade-off. 1

Reference: text


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 com Abstract We consider the online distributed non-stochastic experts problem, where the distributed system consists of one coordinator node that is connected to k sites, and the sites are required to communicate with each other via the coordinator. [sent-6, score-0.669]

2 At each time-step t, one of the k site nodes has to pick an expert from the set {1, . [sent-7, score-0.4]

3 , n}, and the same site receives information about payoffs of all experts for that round. [sent-10, score-0.388]

4 The goal of the distributed system is to minimize regret at time horizon T , while simultaneously keeping communication to a minimum. [sent-11, score-0.622]

5 (ii) No communication: Each site runs an independent copy – the regret is O( log(n)kT ) and the communication is 0. [sent-13, score-0.68]

6 This paper shows the √ difficulty of simultaneously achieving regret asymptotically better than kT and communication better than T . [sent-14, score-0.516]

7 We give a novel algorithm that for an oblivious √ adversary achieves a non-trivial trade-off: regret O( k 5(1+ )/6 T ) and communication O(T /k ), for any value of ∈ (0, 1/5). [sent-15, score-0.678]

8 (2005) already gives us strategy that is near optimal in regret vs communication trade-off. [sent-18, score-0.472]

9 At the end of the round, she observes a payoff vector pt ∈ [0, 1]n , where pt [a] denotes the payoff that would have been received by following the advice of expert a. [sent-27, score-1.871]

10 The payoff received by the decision-maker is pt [at ]. [sent-28, score-0.85]

11 In the non-stochastic setting, an adversary decides the payoff vectors at any time step. [sent-29, score-0.732]

12 At the end of the T rounds, the regret of the decision maker is the difference in the payoff that she would have received using the single best expert at all times in hindsight, and the payoff that she actually received, i. [sent-30, score-1.74]

13 ) Tight bounds on regret for the non-stochastic expert problem are obtained by the so-called follow the regularized leader approaches; at time t, the decision-maker chooses a distribution, xt , over the t−1 n experts. [sent-36, score-0.568]

14 We consider the setting when the decision maker is a distributed system, where several different nodes may select experts and/or observe payoffs at different time-steps. [sent-40, score-0.283]

15 2 Models and Summary of Results We consider a distributed computation model consisting of one central coordinator node connected to k site nodes. [sent-47, score-0.502]

16 The site nodes must communicate with each other using the coordinator node. [sent-48, score-0.456]

17 At each time step, the distributed system receives a query1 , which indicates that it must choose an expert to follow. [sent-49, score-0.352]

18 At the end of the round, the distributed system observes the payoff vector. [sent-50, score-0.755]

19 We consider two different models described in detail below: the site prediction model where one of the k sites receives a query at any given time-step, and the coordinator prediction model where the query is always received at the coordinator node. [sent-51, score-0.971]

20 In both these models, the payoff vector, pt , is always observed at one of the k site nodes. [sent-52, score-0.992]

21 Thus, some communication is required to share the information about the payoff vectors among nodes. [sent-53, score-0.89]

22 All missing proofs are provided in the long version [7] Goal: The algorithm implemented on the distributed system may use randomness, both to decide which expert to pick and to decide when to communicate with other nodes. [sent-55, score-0.389]

23 We focus on simultaneously minimizing the expected regret and the expected communication used by the (distributed) algorithm. [sent-56, score-0.527]

24 Recall, that the expected regret is: T T pt [a] − E[R] = E max a∈[n] t=1 pt [at ], (1) t=1 where the expectation is over the random choices made by the algorithm. [sent-57, score-0.609]

25 The expected communication is simply the expected number (over the random choices) of messages sent in the system. [sent-58, score-0.297]

26 Adversaries: In the non-stochastic setting, we assume that an adversary may decide the payoff vectors, pt , at each time-step and also the site, st , that receives the payoff vector (and also the query in the site-prediction model). [sent-66, score-1.659]

27 An oblivious adversary cannot see any of the actions of the distributed system, i. [sent-67, score-0.282]

28 In addition to knowing the description of the algorithm, an adaptive adversary is stronger and can record all of the past actions of the algorithm, and use these arbitrarily to decide the future payoff vectors and site allocations. [sent-71, score-0.945]

29 We require that message size not depend k or T , but only on the 1 We do not use the word query in the sense of explicitly giving some information or context, but merely as indication of occurrence of an event that forces some site or coordinator to choose an expert 2 number of experts n. [sent-73, score-0.74]

30 As is standard in the distributed systems literature, we assume that communication delay is 0, i. [sent-76, score-0.337]

31 The payoff vector pt ∈ [0, 1]n , where pt [i] is the payoff of the ith expert is revealed only to the site st and the decision-maker (distributed system) receives payoff pt [at ], corresponding to the expert actually chosen. [sent-88, score-3.055]

32 The site prediction model is commonly studied in distributed machine learning settings (see [8, 9, 10]). [sent-89, score-0.299]

33 There are two very simple algorithms in this model: (i) Full communication: The coordinator always maintains the current cumulative payoff vector, t−1 τ t−1 τ t τ =1 p . [sent-97, score-0.966]

34 At time step t, s receives the current cumulative payoff vector τ =1 p from the t t coordinator, chooses an expert a ∈ [n] using FPL, receives payoff vector p and sends pt to the coordinator, which updates its cumulative payoff vector. [sent-98, score-2.592]

35 Note that the total communication is 2T √ and the system simulates (non-distributed) FPL to achieve (optimal) regret guarantee O( nT ). [sent-99, score-0.512]

36 (ii) No communication: Each site maintains cumulative payoff vectors corresponding to the queries received by them, thus implementing k independent versions of FPL. [sent-100, score-1.024]

37 Suppose that the √ site ith √ k k receives a total of Ti queries ( i=1 Ti = T ), the regret is bounded by i=1 O( nTi ) = O( nkT ) and the total communication is 0. [sent-101, score-0.74]

38 This upper bound is actually tight in the event that there is 0 communication (see the accompanying long version [7]). [sent-102, score-0.277]

39 √ Simultaneously achieving regret that is asymptotically lower than knT using communication asymptotically lower than T turns out to be a significantly challenging question. [sent-103, score-0.55]

40 Our main positive result is the first distributed expert algorithm in the oblivious adversarial (non-stochastic) setting, using sub-linear communication. [sent-104, score-0.376]

41 3 , there exists an algorithm √ the distributed experts problem that for against an oblivious adversary achieves regret O(log(n) k 5(1+ )/6 T ) and uses communication O(T /k ), giving non-trivial guarantees in the range ∈ (0, 1/5). [sent-108, score-0.869]

42 C OORDINATOR P REDICTION M ODEL: At every time step, the query is received by the coordinator node, which chooses an expert at ∈ [n]. [sent-110, score-0.51]

43 However, at the end of the round, one of the site nodes, say st , observes the payoff vector pt . [sent-111, score-1.033]

44 The payoff vectors pt and choice of sites st are decided by an adversary. [sent-112, score-0.973]

45 √ The full communication protocol is equally applicable here getting optimal regret bound, O( nT ) at the cost of substantial (essentially T ) communication. [sent-114, score-0.501]

46 1-3 in [2]), where the decision-maker has a limited budget and has to spend part of its budget to observe any payoff information. [sent-117, score-0.653]

47 The optimal strategy is to request payoff information randomly with probability C/T at each time-step, if C is the communication budget. [sent-118, score-0.868]

48 [14] (Informal) The LEF algorithms using FPL with communication budget C achieves regret O(T n/C) against both an adaptive and an oblivious adversary. [sent-121, score-0.605]

49 One of the crucial differences between this model and that of the label-efficient setting is that when communication does occur, the site can send cumulative payoff vectors comprising all previous updates to the coordinator rather than just the latest one. [sent-122, score-1.475]

50 Then any (distributed) algorithm that achieves √ expected regret o( kT ) must use communication (T /k)(1 − o(1)). [sent-128, score-0.504]

51 Notice that in the coordinator prediction model, when C = T /k, this lower bound is matched by the upper bound of LEF. [sent-130, score-0.293]

52 The so called regularized leader algorithms, maintain a cumulative payoff vector, Pt , and use only this and a regularizer to select an expert at time t. [sent-132, score-1.052]

53 We consider two variants in the distributed setting: ˜ (i) Distributed Counter Algorithms: Here the forecaster only uses Pt , which is an (approximate) t version of the cumulative payoff vector P . [sent-133, score-0.894]

54 Pt can be maintained while using sub-linear communication by applying techniques from distributed systems literature [12]. [sent-135, score-0.337]

55 (ii) Delayed Regularized Leader: Here the regularized leaders don’t try to explicitly maintain an approximate version of the cumulative payoff vector. [sent-136, score-0.795]

56 Instead, they may use an arbitrary communication protocol, but make prediction using the cumulative payoff vector (using any past payoff vectors that they could have received) and some regularizer. [sent-137, score-1.653]

57 2 that the distributed counter approach does not yield any non-trivial guarantee in the site-prediction model even against an oblivious adversary. [sent-139, score-0.278]

58 It is possible to show a similar lower bound the in the coordinator prediction model, but is omitted since it follows easily from the idea in the site-prediction model combined with an explicit communication lower bound given in [12]. [sent-140, score-0.554]

59 Section 4 shows that the delayed regularized leader approach is ineffective even against an oblivious adversary for coordinator prediction model, suggesting LEF algorithm is near optimal. [sent-141, score-0.564]

60 Questions such as network structure [9] and network delays [10] are interesting in our setting as well, however, at present our work focuses on establishing some non-trivial regret guarantees in the distributed online non-stochastic experts setting. [sent-145, score-0.422]

61 Study of communication as a resource in distributed learning is also considered in [15, 16, 17]; however, this body of work seems only applicable to offline learning. [sent-146, score-0.337]

62 1 Site-prediction model Upper Bounds We describe our algorithm that simultaneously achieves non-trivial bounds on expected regret and expected communication. [sent-154, score-0.313]

63 Otherwise it is in a block phase, running a copy of FPL (with noise parameter η) across blocks with the same expert being followed for the entire block and synchronizing after each block. [sent-182, score-0.411]

64 This effectively makes Pi , the cumulative payoff over block i, the payoff vector for the block FPL. [sent-183, score-1.517]

65 , pT ∈ [0, 1]2 be a sequence of payoff vectors such that maxt |pt |∞ ≤ B and let the number of experts be 2. [sent-191, score-0.744]

66 1) when √ with run parameters , T , η = 5/12 T 1/2 and b, η , q as defined in Fig 1, has expected regret O( 5/6 T ) and expected communication O(T k/ ). [sent-198, score-0.508]

67 In particular for = k 1+√for 0 < < 1/5, the algorithm simultaneously achieves regret that is asymptotically lower than kT and communication that is asymptotically lower3 than T . [sent-199, score-0.569]

68 Getting communication of the form √ f (k) for regret bound better than kT , seems to be a fairly difficult and interesting problem 1−δ 5 Since we are in the case of an oblivious adversary, we may assume that the payoff vectors p1 , . [sent-201, score-1.235]

69 Without loss of generality let expert 1 (out of {1, 2}) be the one that has greater payoff in hindsight. [sent-205, score-0.804]

70 Recall that FRi (η ) denotes the random variable that is the regret of 1 playing FPL(η ) in a step phase on block i with respect to the first expert. [sent-206, score-0.353]

71 In particular, this will be negative if expert 2 is the best expert on block i, even though globally expert 1 is better. [sent-207, score-0.626]

72 In fact, this is exactly what our algorithm exploits: it gains on regret in the communication-expensive, step phase while saving on communication in the block phase. [sent-208, score-0.575]

73 This is just the regret corresponding to running FPL(η) at the block level, with T / time steps. [sent-212, score-0.302]

74 We chose η = √ analysis of FPL guarantees that E[FRi (η )] ≤ 2 , where FRi (η ) denotes the random variable that is the actual regret of FPL(η ), not the regret with respect to expert 1 (which is FRi (η )). [sent-215, score-0.633]

75 Note that after the block ζ is processed, the algorithm in the block phase will never follow expert 2. [sent-232, score-0.386]

76 In this case ζ “term 2” can be bounded easily as follows: η i=1 |Pi [1] − Pi [2]| ≤ η (αζ + (1 − α)ζλ) ≤ 2η The above three cases exhaust all possibilities and hence no matter what the nature of the payoff sequence, the expected regret of DFPL is bounded by O(η) as required. [sent-254, score-0.864]

77 The expected total communication is easily seen to be O(qT +T k/ ) – the q(T / ) blocks on which step FPL is used contribute O( ) communication each, and the (1 − q)(T / ) blocks where block FPL is used contributed O(k) communication each. [sent-255, score-0.898]

78 2 Lower Bounds In this section we give a lower bound on distributed counter algorithms in the site prediction model. [sent-264, score-0.415]

79 We consider the class of algorithms such that: (i) Whenever each site receives a query, it has an (approximate) cumulative payoff of each expert to additive accuracy β. [sent-269, score-1.167]

80 (ii) Any site only uses the (approximate) cumulative payoffs and any local information it may have to choose an expert when queried. [sent-271, score-0.55]

81 However, our negative result shows that even with a highly accurate counter β = O(k), the non√ stochasticity of the payoff sequence may cause any such algorithm to have Ω( kT ) regret. [sent-272, score-0.706]

82 At any time step t, suppose each site has an (approximate) cumulative payoff count, ˜ ˜ Pt [a], for every expert such that |Pt [a] − Pt [a]| ≤ β. [sent-275, score-1.113]

83 If β ≤ k, any algorithm that uses the approximate counts Pt [a] and any local information at the √ site making the decision, cannot achieve expected regret asymptotically better than βT . [sent-277, score-0.47]

84 Any protocol on the distributed system that guarantees that at each time step, each site has a β = k/10 approximate cumulative payoff with probability ≥ 1/2, uses Ω(T ) communication. [sent-279, score-1.088]

85 The label-efficient predictor translates into the following simple protocol: Whenever a site receives a payoff vector, it will forward that particular payoff to the coordinator with probability p ≈ C/T . [sent-283, score-1.708]

86 The coordinator will always execute the exponentially weighted forecaster over the sampled subset of payoffs to make new decisions. [sent-284, score-0.357]

87 √ In other words, if our regret needs to be O( T ), the communication needs to be linear in T . [sent-286, score-0.472]

88 4 The approximation guarantee is only required when a site receives a query and has to make a prediction. [sent-287, score-0.286]

89 The type of algorithms we consider may have an arbitrary communication protocol, but it satisfies the following: (i) Whenever a site communicates with the coordinator, the site will report√ local cumulative payoff vector. [sent-290, score-1.365]

90 (ii) When the coordinator its √ makes a decision, it will execute, FPL( T ), (follow the perturbed leader with noise T ) using the latest cumulative payoff vector. [sent-291, score-1.115]

91 Consider the distributed non-stochastic expert problem in √ coordinator prediction model. [sent-294, score-0.518]

92 Any algorithm of the kind described above that achieves regret O( T ) must use Ω(T 1− ) communication against an oblivious adversary for every constant . [sent-295, score-0.678]

93 5 6 4 2 0 0 2 4 DFPL Mini−batches HYZ 8 500 x 10 (a) 1000 1500 Worst−case regret 2000 (b) Figure 2: (a) - Cumulative regret for the MC sequences as a function of correlation λ, (b) - Worst-case cumulative regret vs. [sent-302, score-0.796]

94 In the mini-batch algorithm, the coordinator requests randomly, with some probability p at any time step, all cumulative payoff vectors at all sites. [sent-306, score-0.988]

95 It then broadcasts the sum (across all of the sites) back to the sites, so that all sites have the latest cumulative payoff vector. [sent-307, score-0.877]

96 [12] to maintain the (approximate) cumulative payoff for each expert. [sent-311, score-0.764]

97 Whenever a counter update occurs, the coordinator must broadcast to all nodes to make sure they have the most current update. [sent-312, score-0.343]

98 For the first µ time steps the payoff vector is always (1, 0) (expert 1 being better), then for the next 2µ time steps, the payoff vector is (0, 1) (expert 2 is better), and then again for the next 2µ time-steps, payoff vector is (1, 0) and so on. [sent-315, score-1.863]

99 While in state 1, the payoff vector is (1, 0) and when in state 2 it is (0, 1). [sent-318, score-0.621]

100 2 (b) shows the worst-case cumulative communication vs the worst-case cumulative regret trade-off for three algorithms: DFPL, mini-batch and HYZ, over all the described sequences. [sent-323, score-0.714]


similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('payoff', 0.621), ('pi', 0.26), ('fpl', 0.254), ('communication', 0.247), ('regret', 0.225), ('coordinator', 0.224), ('fri', 0.207), ('site', 0.188), ('expert', 0.183), ('pt', 0.183), ('dfpl', 0.161), ('cumulative', 0.121), ('sites', 0.105), ('oblivious', 0.103), ('distributed', 0.09), ('adversary', 0.089), ('leader', 0.088), ('experts', 0.088), ('counter', 0.085), ('block', 0.077), ('forecaster', 0.062), ('payoffs', 0.058), ('receives', 0.054), ('received', 0.046), ('hyz', 0.046), ('query', 0.044), ('mc', 0.039), ('lef', 0.038), ('kt', 0.036), ('fpi', 0.035), ('radunovi', 0.035), ('blocks', 0.031), ('perturbed', 0.031), ('latest', 0.03), ('protocol', 0.029), ('communicate', 0.028), ('queries', 0.026), ('phase', 0.026), ('asymptotically', 0.025), ('decide', 0.025), ('playing', 0.025), ('qi', 0.025), ('system', 0.025), ('odel', 0.023), ('rediction', 0.023), ('synchronizing', 0.023), ('zhenming', 0.023), ('follow', 0.023), ('send', 0.022), ('maxa', 0.022), ('maintain', 0.022), ('st', 0.022), ('delayed', 0.022), ('vectors', 0.022), ('prediction', 0.021), ('counters', 0.02), ('ai', 0.02), ('decided', 0.02), ('copy', 0.02), ('round', 0.02), ('simultaneously', 0.019), ('online', 0.019), ('kanade', 0.019), ('saha', 0.019), ('cormode', 0.019), ('bounds', 0.019), ('observes', 0.019), ('expected', 0.018), ('pods', 0.018), ('broadcast', 0.018), ('maker', 0.018), ('regularized', 0.017), ('term', 0.017), ('bound', 0.017), ('phillips', 0.017), ('whenever', 0.017), ('nodes', 0.016), ('horizon', 0.016), ('budget', 0.016), ('simulates', 0.015), ('advice', 0.015), ('remark', 0.015), ('yi', 0.015), ('monitoring', 0.015), ('harvard', 0.015), ('protocols', 0.015), ('lower', 0.014), ('approximate', 0.014), ('achieves', 0.014), ('sent', 0.014), ('daum', 0.014), ('ii', 0.014), ('execute', 0.013), ('pick', 0.013), ('denoting', 0.013), ('decision', 0.013), ('actually', 0.013), ('maxt', 0.013), ('giving', 0.013), ('chooses', 0.013)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 1.0 102 nips-2012-Distributed Non-Stochastic Experts

Author: Varun Kanade, Zhenming Liu, Bozidar Radunovic

Abstract: We consider the online distributed non-stochastic experts problem, where the distributed system consists of one coordinator node that is connected to k sites, and the sites are required to communicate with each other via the coordinator. At each time-step t, one of the k site nodes has to pick an expert from the set {1, . . . , n}, and the same site receives information about payoffs of all experts for that round. The goal of the distributed system is to minimize regret at time horizon T , while simultaneously keeping communication to a minimum. The two extreme solutions to this problem are: (i) Full communication: This essentially simulates the nondistributed setting to obtain the optimal O( log(n)T ) regret bound at the cost of T communication. (ii) No communication: Each site runs an independent copy – the regret is O( log(n)kT ) and the communication is 0. This paper shows the √ difficulty of simultaneously achieving regret asymptotically better than kT and communication better than T . We give a novel algorithm that for an oblivious √ adversary achieves a non-trivial trade-off: regret O( k 5(1+ )/6 T ) and communication O(T /k ), for any value of ∈ (0, 1/5). We also consider a variant of the model, where the coordinator picks the expert. In this model, we show that the label-efficient forecaster of Cesa-Bianchi et al. (2005) already gives us strategy that is near optimal in regret vs communication trade-off. 1

2 0.14360748 216 nips-2012-Mirror Descent Meets Fixed Share (and feels no regret)

Author: Nicolò Cesa-bianchi, Pierre Gaillard, Gabor Lugosi, Gilles Stoltz

Abstract: Mirror descent with an entropic regularizer is known to achieve shifting regret bounds that are logarithmic in the dimension. This is done using either a carefully designed projection or by a weight sharing technique. Via a novel unified analysis, we show that these two approaches deliver essentially equivalent bounds on a notion of regret generalizing shifting, adaptive, discounted, and other related regrets. Our analysis also captures and extends the generalized weight sharing technique of Bousquet and Warmuth, and can be refined in several ways, including improvements for small losses and adaptive tuning of parameters. 1

3 0.14318189 64 nips-2012-Calibrated Elastic Regularization in Matrix Completion

Author: Tingni Sun, Cun-hui Zhang

Abstract: This paper concerns the problem of matrix completion, which is to estimate a matrix from observations in a small subset of indices. We propose a calibrated spectrum elastic net method with a sum of the nuclear and Frobenius penalties and develop an iterative algorithm to solve the convex minimization problem. The iterative algorithm alternates between imputing the missing entries in the incomplete matrix by the current guess and estimating the matrix by a scaled soft-thresholding singular value decomposition of the imputed matrix until the resulting matrix converges. A calibration step follows to correct the bias caused by the Frobenius penalty. Under proper coherence conditions and for suitable penalties levels, we prove that the proposed estimator achieves an error bound of nearly optimal order and in proportion to the noise level. This provides a unified analysis of the noisy and noiseless matrix completion problems. Simulation results are presented to compare our proposal with previous ones. 1

4 0.12252782 241 nips-2012-No-Regret Algorithms for Unconstrained Online Convex Optimization

Author: Brendan Mcmahan, Matthew Streeter

Abstract: Some of the most compelling applications of online convex optimization, including online prediction and classification, are unconstrained: the natural feasible set is Rn . Existing algorithms fail to achieve sub-linear regret in this setting unless constraints on the comparator point ˚ are known in advance. We present algox rithms that, without such prior knowledge, offer near-optimal regret bounds with respect to any choice of ˚. In particular, regret with respect to ˚ = 0 is constant. x x We then prove lower bounds showing that our guarantees are near-optimal in this setting. 1

5 0.11601436 295 nips-2012-Risk-Aversion in Multi-armed Bandits

Author: Amir Sani, Alessandro Lazaric, Rémi Munos

Abstract: Stochastic multi–armed bandits solve the Exploration–Exploitation dilemma and ultimately maximize the expected reward. Nonetheless, in many practical problems, maximizing the expected reward is not the most desirable objective. In this paper, we introduce a novel setting based on the principle of risk–aversion where the objective is to compete against the arm with the best risk–return trade–off. This setting proves to be more difficult than the standard multi-arm bandit setting due in part to an exploration risk which introduces a regret associated to the variability of an algorithm. Using variance as a measure of risk, we define two algorithms, investigate their theoretical guarantees, and report preliminary empirical results. 1

6 0.095999502 258 nips-2012-Online L1-Dictionary Learning with Application to Novel Document Detection

7 0.094567388 3 nips-2012-A Bayesian Approach for Policy Learning from Trajectory Preference Queries

8 0.090713605 293 nips-2012-Relax and Randomize : From Value to Algorithms

9 0.087189689 162 nips-2012-Inverse Reinforcement Learning through Structured Classification

10 0.079916373 259 nips-2012-Online Regret Bounds for Undiscounted Continuous Reinforcement Learning

11 0.076606378 53 nips-2012-Bayesian Pedigree Analysis using Measure Factorization

12 0.074079663 110 nips-2012-Efficient Reinforcement Learning for High Dimensional Linear Quadratic Systems

13 0.070026457 109 nips-2012-Efficient Monte Carlo Counterfactual Regret Minimization in Games with Many Player Actions

14 0.064690292 345 nips-2012-Topic-Partitioned Multinetwork Embeddings

15 0.060613219 252 nips-2012-On Multilabel Classification and Ranking with Partial Feedback

16 0.056772027 106 nips-2012-Dynamical And-Or Graph Learning for Object Shape Modeling and Detection

17 0.053241558 149 nips-2012-Hierarchical Optimistic Region Selection driven by Curiosity

18 0.05049438 188 nips-2012-Learning from Distributions via Support Measure Machines

19 0.048129205 61 nips-2012-Best Arm Identification: A Unified Approach to Fixed Budget and Fixed Confidence

20 0.044744328 339 nips-2012-The Time-Marginalized Coalescent Prior for Hierarchical Clustering


similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.103), (1, -0.075), (2, 0.053), (3, 0.032), (4, 0.04), (5, -0.02), (6, -0.016), (7, 0.011), (8, -0.072), (9, 0.032), (10, 0.104), (11, 0.068), (12, -0.113), (13, -0.076), (14, -0.083), (15, 0.091), (16, -0.044), (17, -0.013), (18, -0.062), (19, -0.092), (20, 0.006), (21, -0.045), (22, -0.009), (23, -0.028), (24, -0.001), (25, 0.015), (26, -0.038), (27, -0.023), (28, -0.024), (29, -0.03), (30, -0.072), (31, -0.082), (32, 0.028), (33, 0.062), (34, 0.029), (35, -0.061), (36, 0.057), (37, 0.029), (38, -0.146), (39, 0.05), (40, 0.122), (41, 0.032), (42, -0.054), (43, 0.115), (44, -0.066), (45, 0.177), (46, -0.096), (47, -0.144), (48, -0.05), (49, -0.061)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.95665443 102 nips-2012-Distributed Non-Stochastic Experts

Author: Varun Kanade, Zhenming Liu, Bozidar Radunovic

Abstract: We consider the online distributed non-stochastic experts problem, where the distributed system consists of one coordinator node that is connected to k sites, and the sites are required to communicate with each other via the coordinator. At each time-step t, one of the k site nodes has to pick an expert from the set {1, . . . , n}, and the same site receives information about payoffs of all experts for that round. The goal of the distributed system is to minimize regret at time horizon T , while simultaneously keeping communication to a minimum. The two extreme solutions to this problem are: (i) Full communication: This essentially simulates the nondistributed setting to obtain the optimal O( log(n)T ) regret bound at the cost of T communication. (ii) No communication: Each site runs an independent copy – the regret is O( log(n)kT ) and the communication is 0. This paper shows the √ difficulty of simultaneously achieving regret asymptotically better than kT and communication better than T . We give a novel algorithm that for an oblivious √ adversary achieves a non-trivial trade-off: regret O( k 5(1+ )/6 T ) and communication O(T /k ), for any value of ∈ (0, 1/5). We also consider a variant of the model, where the coordinator picks the expert. In this model, we show that the label-efficient forecaster of Cesa-Bianchi et al. (2005) already gives us strategy that is near optimal in regret vs communication trade-off. 1

2 0.58468503 241 nips-2012-No-Regret Algorithms for Unconstrained Online Convex Optimization

Author: Brendan Mcmahan, Matthew Streeter

Abstract: Some of the most compelling applications of online convex optimization, including online prediction and classification, are unconstrained: the natural feasible set is Rn . Existing algorithms fail to achieve sub-linear regret in this setting unless constraints on the comparator point ˚ are known in advance. We present algox rithms that, without such prior knowledge, offer near-optimal regret bounds with respect to any choice of ˚. In particular, regret with respect to ˚ = 0 is constant. x x We then prove lower bounds showing that our guarantees are near-optimal in this setting. 1

3 0.57378203 258 nips-2012-Online L1-Dictionary Learning with Application to Novel Document Detection

Author: Shiva P. Kasiviswanathan, Huahua Wang, Arindam Banerjee, Prem Melville

Abstract: Given their pervasive use, social media, such as Twitter, have become a leading source of breaking news. A key task in the automated identification of such news is the detection of novel documents from a voluminous stream of text documents in a scalable manner. Motivated by this challenge, we introduce the problem of online 1 -dictionary learning where unlike traditional dictionary learning, which uses squared loss, the 1 -penalty is used for measuring the reconstruction error. We present an efficient online algorithm for this problem based on alternating directions method of multipliers, and establish a sublinear regret bound for this algorithm. Empirical results on news-stream and Twitter data, shows that this online 1 -dictionary learning algorithm for novel document detection gives more than an order of magnitude speedup over the previously known batch algorithm, without any significant loss in quality of results. 1

4 0.54757524 64 nips-2012-Calibrated Elastic Regularization in Matrix Completion

Author: Tingni Sun, Cun-hui Zhang

Abstract: This paper concerns the problem of matrix completion, which is to estimate a matrix from observations in a small subset of indices. We propose a calibrated spectrum elastic net method with a sum of the nuclear and Frobenius penalties and develop an iterative algorithm to solve the convex minimization problem. The iterative algorithm alternates between imputing the missing entries in the incomplete matrix by the current guess and estimating the matrix by a scaled soft-thresholding singular value decomposition of the imputed matrix until the resulting matrix converges. A calibration step follows to correct the bias caused by the Frobenius penalty. Under proper coherence conditions and for suitable penalties levels, we prove that the proposed estimator achieves an error bound of nearly optimal order and in proportion to the noise level. This provides a unified analysis of the noisy and noiseless matrix completion problems. Simulation results are presented to compare our proposal with previous ones. 1

5 0.53414237 216 nips-2012-Mirror Descent Meets Fixed Share (and feels no regret)

Author: Nicolò Cesa-bianchi, Pierre Gaillard, Gabor Lugosi, Gilles Stoltz

Abstract: Mirror descent with an entropic regularizer is known to achieve shifting regret bounds that are logarithmic in the dimension. This is done using either a carefully designed projection or by a weight sharing technique. Via a novel unified analysis, we show that these two approaches deliver essentially equivalent bounds on a notion of regret generalizing shifting, adaptive, discounted, and other related regrets. Our analysis also captures and extends the generalized weight sharing technique of Bousquet and Warmuth, and can be refined in several ways, including improvements for small losses and adaptive tuning of parameters. 1

6 0.51897681 295 nips-2012-Risk-Aversion in Multi-armed Bandits

7 0.4900789 109 nips-2012-Efficient Monte Carlo Counterfactual Regret Minimization in Games with Many Player Actions

8 0.43284655 44 nips-2012-Approximating Concavely Parameterized Optimization Problems

9 0.42574638 61 nips-2012-Best Arm Identification: A Unified Approach to Fixed Budget and Fixed Confidence

10 0.41905108 259 nips-2012-Online Regret Bounds for Undiscounted Continuous Reinforcement Learning

11 0.40954763 110 nips-2012-Efficient Reinforcement Learning for High Dimensional Linear Quadratic Systems

12 0.34371516 260 nips-2012-Online Sum-Product Computation Over Trees

13 0.3338173 11 nips-2012-A Marginalized Particle Gaussian Process Regression

14 0.31155115 293 nips-2012-Relax and Randomize : From Value to Algorithms

15 0.29938883 206 nips-2012-Majorization for CRFs and Latent Likelihoods

16 0.2662527 53 nips-2012-Bayesian Pedigree Analysis using Measure Factorization

17 0.26005965 162 nips-2012-Inverse Reinforcement Learning through Structured Classification

18 0.24547179 358 nips-2012-Value Pursuit Iteration

19 0.24447984 245 nips-2012-Nonparametric Bayesian Inverse Reinforcement Learning for Multiple Reward Functions

20 0.24275208 39 nips-2012-Analog readout for optical reservoir computers


similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(0, 0.022), (21, 0.021), (38, 0.093), (39, 0.012), (42, 0.039), (53, 0.015), (54, 0.013), (55, 0.02), (74, 0.08), (76, 0.083), (80, 0.062), (84, 0.334), (92, 0.093)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.68832272 102 nips-2012-Distributed Non-Stochastic Experts

Author: Varun Kanade, Zhenming Liu, Bozidar Radunovic

Abstract: We consider the online distributed non-stochastic experts problem, where the distributed system consists of one coordinator node that is connected to k sites, and the sites are required to communicate with each other via the coordinator. At each time-step t, one of the k site nodes has to pick an expert from the set {1, . . . , n}, and the same site receives information about payoffs of all experts for that round. The goal of the distributed system is to minimize regret at time horizon T , while simultaneously keeping communication to a minimum. The two extreme solutions to this problem are: (i) Full communication: This essentially simulates the nondistributed setting to obtain the optimal O( log(n)T ) regret bound at the cost of T communication. (ii) No communication: Each site runs an independent copy – the regret is O( log(n)kT ) and the communication is 0. This paper shows the √ difficulty of simultaneously achieving regret asymptotically better than kT and communication better than T . We give a novel algorithm that for an oblivious √ adversary achieves a non-trivial trade-off: regret O( k 5(1+ )/6 T ) and communication O(T /k ), for any value of ∈ (0, 1/5). We also consider a variant of the model, where the coordinator picks the expert. In this model, we show that the label-efficient forecaster of Cesa-Bianchi et al. (2005) already gives us strategy that is near optimal in regret vs communication trade-off. 1

2 0.65703642 234 nips-2012-Multiresolution analysis on the symmetric group

Author: Risi Kondor, Walter Dempsey

Abstract: There is no generally accepted way to define wavelets on permutations. We address this issue by introducing the notion of coset based multiresolution analysis (CMRA) on the symmetric group, find the corresponding wavelet functions, and describe a fast wavelet transform for sparse signals. We discuss potential applications in ranking, sparse approximation, and multi-object tracking. 1

3 0.64297092 59 nips-2012-Bayesian nonparametric models for bipartite graphs

Author: Francois Caron

Abstract: We develop a novel Bayesian nonparametric model for random bipartite graphs. The model is based on the theory of completely random measures and is able to handle a potentially infinite number of nodes. We show that the model has appealing properties and in particular it may exhibit a power-law behavior. We derive a posterior characterization, a generative process for network growth, and a simple Gibbs sampler for posterior simulation. Our model is shown to be well fitted to several real-world social networks. 1

4 0.60382748 51 nips-2012-Bayesian Hierarchical Reinforcement Learning

Author: Feng Cao, Soumya Ray

Abstract: We describe an approach to incorporating Bayesian priors in the MAXQ framework for hierarchical reinforcement learning (HRL). We define priors on the primitive environment model and on task pseudo-rewards. Since models for composite tasks can be complex, we use a mixed model-based/model-free learning approach to find an optimal hierarchical policy. We show empirically that (i) our approach results in improved convergence over non-Bayesian baselines, (ii) using both task hierarchies and Bayesian priors is better than either alone, (iii) taking advantage of the task hierarchy reduces the computational cost of Bayesian reinforcement learning and (iv) in this framework, task pseudo-rewards can be learned instead of being manually specified, leading to hierarchically optimal rather than recursively optimal policies. 1

5 0.5236274 264 nips-2012-Optimal kernel choice for large-scale two-sample tests

Author: Arthur Gretton, Dino Sejdinovic, Heiko Strathmann, Sivaraman Balakrishnan, Massimiliano Pontil, Kenji Fukumizu, Bharath K. Sriperumbudur

Abstract: Given samples from distributions p and q, a two-sample test determines whether to reject the null hypothesis that p = q, based on the value of a test statistic measuring the distance between the samples. One choice of test statistic is the maximum mean discrepancy (MMD), which is a distance between embeddings of the probability distributions in a reproducing kernel Hilbert space. The kernel used in obtaining these embeddings is critical in ensuring the test has high power, and correctly distinguishes unlike distributions with high probability. A means of parameter selection for the two-sample test based on the MMD is proposed. For a given test level (an upper bound on the probability of making a Type I error), the kernel is chosen so as to maximize the test power, and minimize the probability of making a Type II error. The test statistic, test threshold, and optimization over the kernel parameters are obtained with cost linear in the sample size. These properties make the kernel selection and test procedures suited to data streams, where the observations cannot all be stored in memory. In experiments, the new kernel selection approach yields a more powerful test than earlier kernel selection heuristics.

6 0.47337025 87 nips-2012-Convolutional-Recursive Deep Learning for 3D Object Classification

7 0.47124738 42 nips-2012-Angular Quantization-based Binary Codes for Fast Similarity Search

8 0.47116065 235 nips-2012-Natural Images, Gaussian Mixtures and Dead Leaves

9 0.46620876 101 nips-2012-Discriminatively Trained Sparse Code Gradients for Contour Detection

10 0.4642531 260 nips-2012-Online Sum-Product Computation Over Trees

11 0.4588356 229 nips-2012-Multimodal Learning with Deep Boltzmann Machines

12 0.4570893 349 nips-2012-Training sparse natural image models with a fast Gibbs sampler of an extended state space

13 0.45498347 176 nips-2012-Learning Image Descriptors with the Boosting-Trick

14 0.45452619 82 nips-2012-Continuous Relaxations for Discrete Hamiltonian Monte Carlo

15 0.45104808 148 nips-2012-Hamming Distance Metric Learning

16 0.45003912 168 nips-2012-Kernel Latent SVM for Visual Recognition

17 0.4499318 8 nips-2012-A Generative Model for Parts-based Object Segmentation

18 0.4493188 92 nips-2012-Deep Representations and Codes for Image Auto-Annotation

19 0.44895694 90 nips-2012-Deep Learning of Invariant Features via Simulated Fixations in Video

20 0.44880807 197 nips-2012-Learning with Recursive Perceptual Representations