jmlr jmlr2011 jmlr2011-45 knowledge-graph by maker-knowledge-mining

45 jmlr-2011-Internal Regret with Partial Monitoring: Calibration-Based Optimal Algorithms

Source: pdf

Author: Vianney Perchet

Abstract: We provide consistent random algorithms for sequential decision under partial monitoring, when the decision maker does not observe the outcomes but receives instead random feedback signals. Those algorithms have no internal regret in the sense that, on the set of stages where the decision maker chose his action according to a given law, the average payoff could not have been improved in average by using any other ﬁxed law. They are based on a generalization of calibration, no longer deﬁned in terms of a Vorono¨ ı diagram but instead of a Laguerre diagram (a more general concept). This allows us to bound, for the ﬁrst time in this general framework, the expected average internal, as well as the usual external, regret at stage n by O(n−1/3 ), which is known to be optimal. Keywords: repeated games, on-line learning, regret, partial monitoring, calibration, Vorono¨ and ı Laguerre diagrams

Reference: text

Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 Those algorithms have no internal regret in the sense that, on the set of stages where the decision maker chose his action according to a given law, the average payoff could not have been improved in average by using any other ﬁxed law. [sent-4, score-0.741]

2 Hart and Mas-Colell (2000) also used Blackwell’s approachability theorem to construct explicit algorithms that bound the internal (and therefore the external) regret at stage n by O n−1/2 . [sent-11, score-0.375]

3 Some of those results have been extended to the partial monitoring framework, that is, where the decision maker receives at each stage a random signal, whose law might depend on his unobserved payoff. [sent-12, score-0.506]

4 Rustichini (1999) deﬁned and proved the existence of externally consistent strategies, that is, such that the average payoff of the decision maker could not have been asymptotically greater if he had known, before the beginning of the game, the empirical distribution of signals. [sent-13, score-0.528]

5 In this framework, internal regret was deﬁned by Lehrer and Solan (2007); stages are no longer distinguished as a function of the action chosen by the decision maker (as in the full monitoring case) but as a function of its law. [sent-24, score-0.68]

6 Thanks to the continuity of payoff and signaling functions, both algorithms bound the internal regret by ε + O n−1/2 . [sent-35, score-0.347]

7 We recall deﬁnitions of calibration and regret and we provide a na¨ve algorithm to construct strategies with internal regret ı asymptotically smaller than ε. [sent-46, score-0.425]

8 This generates a payoff ρn = ρ(in , jn ), where ρ is a mapping from I × J to R, and a regret rn ∈ RI deﬁned by: rn = ρ(i, jn ) − ρ(in , jn ) i∈I ∈ RI , where I is the ﬁnite cardinality of I (and J the one of J ). [sent-55, score-1.834]

9 The choices of in and jn depend on the past observations (also called ﬁnite history) hn−1 = (i1 , j1 , . [sent-57, score-0.499]

10 Explicitly, the set of ﬁnite histories is denoted by H = n 0 / n∈N (I × J ) , with (I × J ) = 0 and a strategy σ of the decision maker is a mapping from H to ∆(I ), the set of probability distributions over I . [sent-60, score-0.336]

11 ¯ m=1 Deﬁnition 1 (Hannan, 1957) A strategy σ of the forecaster is externally consistent if for every strategy τ of Nature: lim sup rn ≤ 0, ∀i ∈ I , Pσ,τ −as. [sent-65, score-0.693]

12 ¯i n→∞ In words, a strategy σ is externally consistent if the forecaster could not have had a greater payoff if he had known, before the beginning of the game, the empirical distribution of actions of Nature. [sent-66, score-0.687]

13 Indeed, the external consistency of σ is equivalent to the fact that : ¯ ¯ lim sup max ρ(x, jn ) − ρn ≤ 0, n→∞ x∈∆(I ) Pσ,τ −as. [sent-67, score-0.587]

14 The internal regret of the stage n, denoted by Rn ∈ RI×I , is also generated by the choices of in and jn and its (i, k)-th coordinate is deﬁned by: Rik = n ρ(k, jn ) − ρ(i, jn ) if i = in . [sent-69, score-1.798]

15 Deﬁnition 2 (Foster and Vohra, 1997) A strategy σ of the forecaster is internally consistent if for every strategy τ of Nature: ¯n lim sup Rik ≤ 0 ∀i, k ∈ I , n→∞ 1895 Pσ,τ −as. [sent-72, score-0.698]

16 Denote by Nn (i) the ¯ set of stages before the n-th where the forecaster chose action i and jn (i) ∈ ∆(J ) the empirical distribution of Nature’s actions on this set. [sent-74, score-0.963]

17 , n}; im = i} and ¯ jn (i) = A strategy is ε-internally consistent if for every i, k ∈ I lim sup n→∞ ∑m∈Nn (i) jm ∈ ∆(J ). [sent-77, score-0.77]

18 |Nn (i)| |Nn (i)| ¯ ¯ ρ(k, jn (i)) − ρ(i, jn (i)) − ε ≤ 0, n (2) Pσ,τ −as. [sent-78, score-0.998]

19 Deﬁnition 4 (Dawid, 1982) A strategy σ of the predictor is calibrated (with respect to Y = {y(l); l ∈ L }) if for every strategy τ of Nature, Pσ,τ -as: lim sup n→∞ |Nn (l)| n ¯ jn (l) − y(l) 2 ¯ − jn (l) − y(k) 2 ≤ 0, ∀k, l ∈ L , where · is the Euclidian norm of RJ . [sent-105, score-1.514]

20 In words, a strategy is calibrated if for every l ∈ L , the empirical distribution of states, on the set of stages where y(l) was predicted, is closer to y(l) than to any other y(k) ( or the frequency of l, |Nn (l)|/n, is small). [sent-106, score-0.401]

21 Proposition 5 (Foster and Vohra, 1998) For any ﬁnite grid Y of ∆(J ), there exist calibrated strategies with respect to Y such that for every strategy τ of Nature: Eσ,τ max l,k∈L |Nn (l)| n 2 ¯ jn (l) − y(l) ¯ − jn (l) − y(k) 2 1 ≤O √ n . [sent-109, score-1.356]

22 jn ∈ J ) and the vector payoff is the matrix Un ∈ RL×L where lk Un = jn − y(l) 2− 0 jn − y(k) 2 if l = ln . [sent-113, score-1.874]

23 Indeed for every ¯ l, k ∈ L , the (l, k)-th coordinate of Un is ¯ lk Un = = |Nn (l)| ∑m∈Nn (l) jm − y(l) 2 − jm − y(k) 2 n |Nn (l)| |Nn (l)| ¯ ¯ jn (l) − y(l) 2 − jn (l) − y(k) 2 . [sent-116, score-1.191]

24 We claim of the ﬁrst action and at stage n + 1, play accordingly to any invariant measure of U that this strategy is an approachability strategy of the negative orthant of RL×L because it satisﬁes Blackwell’s (1956a) sufﬁcient condition: ¯ ¯− ¯− ∀n ∈ N, Un − Un , Eλn [Un+1 | jn+1 ] − Un ≤ 0 . [sent-122, score-0.553]

25 Interestingly, the strategy σ we constructed in this proof is actually internally consistent in the game with action spaces L and J and payoffs deﬁned by ρ(l, j) = − j − y(l) 2 . [sent-128, score-0.484]

26 1) implies that with probability at least 1 − δ : Kn ¯ lk ¯ lk Un − Eσ,τ Un ≤ √ n 1898 2 ln 1 . [sent-134, score-0.306]

27 8), implies that with probability at least 1 − δ : vn 1 2 Kn 1 ¯ lk ¯ lk 2 ln Un − Eσ,τ Un ≤ √ ln + . [sent-137, score-0.43]

28 Compute σ, a calibrated strategy with respect to a δ-grid Y = {y(l); l ∈ L } of ∆(J ) in an abstract calibration game Γc . [sent-144, score-0.439]

29 Whenever the decision maker (seen as a predictor) should choose the action l in Γc , then he (seen as a forecaster) chooses i(l) ∈ BR(y(l)) in the original game Γ. [sent-145, score-0.429]

30 By deﬁnition of a calibrated strategy, for every η > 0, there exists with probability 1, an integer N ∈ N such that for every l, k ∈ L and for every n ≥ N : |Nn (l)| n ¯ jn (l) − y(l) 2 ¯ − jn (l) − y(k) 2 ≤ η. [sent-149, score-1.277]

31 Since {y(k); k ∈ L } is a δ-grid of ∆(J ), for every l ∈ L and every n ∈ N, there exists k ∈ L such 2 2 ¯ ¯ that jn (l) − y(k) ≤ δ2 , hence jn (l) − y(l) ≤ δ2 + η |Nnn(l)| . [sent-150, score-1.066]

32 Therefore, since i(l) ∈ BR(y(l)): |Nn (l)| η ¯ ≥ 2 ⇒ jn (l) − y(l) n δ 2 ¯ ¯ ≤ 2δ2 ⇒ ρ(k, jn (l)) − ρ(i(l), jn (l)) ≤ 2ε, ∀k ∈ I , ¯ for every l ∈ L and n ≥ N. [sent-151, score-1.531]

33 The (i, k)-th coordinate of Rn satisﬁes: |Nn (i)| ¯ ik Rn − 2ε n ≤ 1 ∑ ρ(k, jm ) − ρ(i, jm) − 2ε n m∈N (i) = 1 ∑ ∑ ρ(k, jm ) − ρ(i, jm ) − 2ε n l:i(l)=i m∈N (l) n n = |Nn (l)| ¯ ¯ ρ(k, jn (l)) − ρ(i(l), jn (l)) − 2ε . [sent-152, score-1.1]

35 A calibrated strategy with respect to {z(l); l ∈ L } has the property that for every l ∈ L , the frequency of l goes to zero, or the empirical distribution of states on Nn (l), converges to V (l). [sent-168, score-0.343]

36 A calibrated ¯ strategy ensures that jn (l) converges to V (l) (or the frequency of l is small), thus choosing i(l) on ¯ Nn (l) was indeed a ε-best response to jn (l). [sent-170, score-1.387]

37 These two segments are exactly the cells of the Vorono¨ diagram associated to {y(1) = 1/4, y(2) = 3/4}, therefore, performing a calibrated strategy ı 1900 I NTERNAL R EGRET WITH PARTIAL M ONITORING : C ALIBRATION -BASED O PTIMAL A LGORITHMS with respect to {y(1), y(2)} and playing H (resp. [sent-180, score-0.373]

38 Indeed, by Lemma 10 stated below, ∆(J ) can be decomposed into polytopial best-response areas (a polytope is the convex hull of a ﬁnite number of points, its vertices). [sent-184, score-0.405]

39 Given such a polytopial decomposition, one can ﬁnd a ﬁner Vorono¨ diagram ı (i. [sent-185, score-0.35]

40 On the other hand, the description of a Laguerre diagram (this concept generalizes Vorono¨ diagrams) that reﬁnes a polytopial decomposition is quite simple and ı is described in Proposition 11 below. [sent-190, score-0.35]

41 Deﬁnition 9 A covering K = {K i ; i ∈ I } of a polytope K with non-empty interior is a polytopial complex of K if for every i, j in the ﬁnite set I , K i is a polytope with non-empty interior and the polytope K i ∩ K j has empty interior. [sent-194, score-0.748]

42 Lemma 10 There exists a subset I ′ ⊂ I such that {Bi ; i ∈ I ′ } is a polytopial complex of ∆(J ), where Bi is the i-th best response area deﬁned by Bi = {y ∈ ∆(J ); i ∈ BR(y)} = BR−1 (i) . [sent-196, score-0.371]

43 Denote by I ′ any subset of I such 0 ′ / that for every i ∈ I , there exists exactly one i′ ∈ I ′ such that Bi = Bi = 0, then {Bi ; i ∈ I ′ } is a polytopial complex of ∆(J ). [sent-202, score-0.325]

44 1901 P ERCHET Proposition 11 Let K = {K i ; i ∈ I } be a polytopial complex of a polytope K ⊂ Rd . [sent-203, score-0.432]

45 Theorem 13 For any set of sites and weights {y(l) ∈ RJ , ω(l) ∈ R; l ∈ L } there exists a strategy σ of the predictor such that for every strategy τ of Nature: lk Uω,n = + 1 ≤O √ n [ jn − y(l) 0 Eσ,τ 2 − ω(l)] − [ ¯ (Uω,n ) where Uω,n is deﬁned by : 2 − ω(k)] jn − y(k) if l = ln . [sent-227, score-1.551]

46 We have now the material to construct our new tool algorithm : Theorem 15 There exists an internally consistent strategy σ of the forecaster such that for every strategy τ of Nature and every n ∈ N, with Pσ,τ probability greater than 1 − δ:  ¯n max Rik ≤ O  i,k∈I 1 δ ln n  . [sent-231, score-0.812]

47 As in the na¨ve algorithm, the strategy σ of ı the decision maker is constructed through a strategy σ calibrated with respect to {y(l), ω(l); l ∈ L }. [sent-235, score-0.623]

48 The construction of this internally consistent strategy relies on Theorem 13, which is implied by the existence of internally consistent strategies. [sent-241, score-0.435]

49 Therefore the algorithm does not require that the forecaster observes his real payoffs, as long as he knows what is the best response to his information (Nature’s action in this case). [sent-247, score-0.438]

50 The polytopial decomposition of ∆(J ) induced by {bt , ct ; t ∈ T } is exactly the same as the one induced by {γb(t), γc(t); t ∈ T } for any γ > 0. [sent-249, score-0.329]

51 Partial Monitoring In the partial monitoring framework, the decision maker does not observe Nature’s actions. [sent-258, score-0.367]

52 There is a ﬁnite set of signals S (of cardinality S) such that, at stage n the forecaster receives only a random signal sn ∈ S . [sent-259, score-0.404]

53 Its law is s(in , jn ) where s is a mapping from I × J to ∆(S ), known by the decision maker. [sent-260, score-0.568]

54 We denote by fn = s( jn ) the (unobserved) ﬂag of stage n ∈ N. [sent-265, score-0.689]

55 In order to distinguish between those two actions, the forecaster needs to know s(o, y) although action o is never a best response (but is purely informative). [sent-272, score-0.438]

56 As in the full monitoring case, we deﬁne, for every ε ≥ 0, the ε-best response multivalued mapping BRε : ∆(S )I ⇉ ∆(I ) by : BRε ( f ) = x ∈ ∆(I ); W (x, f ) ≥ sup W (z, f ) − ε . [sent-274, score-0.292]

57 z∈∆(I ) Given a ﬂag f ∈ ∆(S )I , the function W (·, f ) may not be linear so the best response of the forecaster might not contain any element of I . [sent-275, score-0.328]

58 1905 P ERCHET Example 2 Matching pennies in the dark: Consider the matching pennies game where the forecaster does not observe the coin but always receives the same signal c: every choice of Nature generates the same ﬂag (c, c). [sent-276, score-0.35]

59 Therefore the only best 1 response of the forecaster is to play 1 H + 2 T , while actions H and T give the worst payoff of -1. [sent-278, score-0.538]

60 In the full monitoring case, the forecaster has no internal regret if, for every i ∈ I , the action i is a best-response to the empirical distribution of Nature’s actions, on the set of stages where i was actually chosen. [sent-281, score-0.736]

61 In the partial monitoring framework, the decision maker’s action should be a best response to the average ﬂag. [sent-282, score-0.377]

62 We make an extra assumption on the characterization of the forecaster’s strategy: it can be generated by a ﬁnite family of mixed actions {x(l) ∈ ∆(I ); l ∈ L } such that, at stage n ∈ N, the forecaster chooses a type ln and, given that type, the law of his action in is x(ln ) ∈ ∆(I ). [sent-284, score-0.694]

63 Roughly speaking, a strategy will be ε-internally consistent (with respect to the set L ) if, for every l ∈ L , x(l) is an ε-best response to f¯n (l), the average ﬂag on Nn (l) (or the frequency of the type l, |Nn (l)|/n, converges to zero). [sent-288, score-0.295]

64 Deﬁnition 18 (Lehrer and Solan, 2007) For every n ∈ N and every l ∈ L , the average internal regret of type l at stage n is ¯ Rn (l) = sup W (x, f¯n (l)) − ρn (l) . [sent-292, score-0.413]

65 x∈∆(I ) A strategy σ of the forecaster is (L , ε)-internally consistent if for every strategy τ of Nature: lim sup n→+∞ |Nn (l)| Rn (l) − ε ≤ 0, n ∀l ∈ L , Pσ,τ -as. [sent-293, score-0.595]

66 In words, a strategy is (L , ε)-internally consistent if, for every l ∈ L , the forecaster could not have had, for sure, a better payoff (of at least ε) if he had known, before the beginning of the game, the average ﬂag on Nn (l) (or the frequency of l is small). [sent-294, score-0.625]

67 For simplicity, we assume in the following sketch of the proof, that the decision maker fully observes the sequence of ﬂags fn = s( jn ) ∈ ∆(S )I . [sent-299, score-0.799]

68 2 Optimal Algorithms As in the full monitoring framework (cf Lemma 10), we deﬁne for every x ∈ ∆(I ) the x-best response area Bx as the set of ﬂags to which x is a best response : Bx = f ∈ ∆(S )I ; x ∈ BR( f ) = BR−1 (x) . [sent-315, score-0.295]

69 In particular, if the decision maker could observe the ﬂag fn before choosing his action xn then, at every stage, xn would be in X. [sent-323, score-0.444]

70 Theorem 22 There exists an internally consistent strategy σ such that for every strategy τ of Nature, with Pσ,τ -probability at least 1 − δ:  |Nn (l)| Rn (l) ≤ O  sup n l∈L 1 δ ln n  . [sent-333, score-0.574]

71 ¯ Hoeffding-Azuma’s inequality implies that with Pσ,τ probability at least 1 − δ2 : |Nn (l)| sn (l) − f¯n (l) ≤ ¯ max l∈L n 2 ln 2SL δ2 (5) n and with probability at least 1 − δ3 : |Nn (l)| ¯ ¯ max ρn (l) − ρ(x(l), jn (l)) ≤ Mρ l∈L n 2L δ3 2 ln n . [sent-341, score-0.787]

72 Deﬁne Nn (l, k) as the set of stages where the decision maker predicts either f (l) or f (k) up to stage n, f¯n (l, k) as ¯ the average ﬂag on this set, ρn (l, k) as the average payoff and Rn (l, k) as the regret. [sent-346, score-0.562]

73 z∈∆(I ) The ﬁnal argument in the proof of Theorem 22 also implies that an internally consistent strategy is also externally consistent, hence we can compare bounds between our algorithm. [sent-351, score-0.332]

74 Our main result is the following: Theorem 24 There exists an internally consistent strategy σ such that, for every strategy τ of Nature, with Pσ,τ probability at least 1 − δ: max l∈L |Nn (l)| Rn (l) ≤ O n 1 n1/3 ln 1 1 1 + 2/3 ln δ δ n . [sent-360, score-0.654]

75 Assume that in an auxiliary game Γc , a predictor computes σ, a calibrated strategy with respect to { f (l), ω(l); l ∈ L }, but where the state at stage n is 1910 I NTERNAL R EGRET WITH PARTIAL M ONITORING : C ALIBRATION -BASED O PTIMAL A LGORITHMS the estimator en ∈ RIS . [sent-370, score-0.547]

76 When the decision maker (seen as a predictor) should choose ln accordingly to σ in Γc , then he (seen as a forecaster) chooses in accordingly to x(ln ) in the original game. [sent-371, score-0.441]

77 2 ln Lemma 12 implies that, with Pσ,τ L2 δ1 + 8 MP L2 ln 3 γn n δ1 , where fn (l) is the projection of en (l) onto P(l). [sent-375, score-0.357]

78 (2008), since for every i ∈ I and s ∈ S , Eσ,τ |ei,s |2 ≤ 1/γn , Freedman’s n inequality implies that with probability at least 1 − δ2 , for every l ∈ L √ |Nn (l)| en (l) − f¯n (l) ≤ IS ¯ n 2 1 2 2LIS 2LIS ln + ln nγn δ2 3nγn δ2 . [sent-377, score-0.351]

79 2n + Ω3 n1/2 As a consequence, for every l ∈ L , with 2 ln 2Ω5 δ + 2Ω5 2 Ω4 ln 2/3 3n δ √ √ √ with the constants deﬁned by Ω1 = 16MP MW LI + 3MW Mρ I, Ω2 = 2MW I 8MP + S , Ω3 = √ Mρ , Ω4 = 2MW (4MP + IS) and Ω5 = L (L + 2 + 2IS). [sent-379, score-0.303]

80 In the label efﬁcient prediction game deﬁned in Example 1, for every strategy σ of the decision maker there exists a sequence of outcomes such that the forecaster expected regret is greater than n−1/3 /7, see Cesa-Bianchi et al. [sent-382, score-0.805]

81 1 deals with a simpler question and exhibits an internally consistent algorithm which requires to solve at each stage a linear program of size polynomial in L0 , the minimal number of polytopes on which BR is constant, instead of a system of linear equations of size L. [sent-390, score-0.353]

82 1 Second Algorithm: Calibration and Polytopial Complex The algorithms we described are quite easy to run stage by stage since the forecaster only needs to compute some invariant measures of non-negative matrices. [sent-394, score-0.48]

83 Deﬁnition 25 A strategy σ is calibrated with respect to the complex {K(l); l ∈ L0 } if for every strategy τ of Nature, Pσ,τ -as: lim sup n→∞ |Nn (l)| n ¯ jn (l), ct,l − bt,l ≤ 0, ∀t ∈ T , ∀l ∈ L0 . [sent-403, score-1.001]

84 Theorem 26 There exist calibrated strategies with respect to any ﬁnite polytopial complex {K(l); l ∈ L0 }. [sent-404, score-0.505]

85 jn ∈ J ) which generates the vector payoff Un ∈ RT L0 deﬁned by: 1 jn = j , ct,l − bt,l if l = ln lk Un = . [sent-408, score-1.375]

86 1912 I NTERNAL R EGRET WITH PARTIAL M ONITORING : C ALIBRATION -BASED O PTIMAL A LGORITHMS Any strategy that approaches the negative orthant Ω− in Γ′ is calibrated with respect to the complex c {K(l); l ∈ L0 }. [sent-410, score-0.314]

87 Blackwell’s characterization of approachable convex sets (Blackwell, 1956a, Theorem 3) implies that the predictor can approach the convex set Ω− if (and only if) for every mixed action of Nature in ∆(J ), he has an action x ∈ ∆(L0 ) such that the expected payoff is in Ω− . [sent-411, score-0.482]

88 Therefore there exist calibrated strategies with respect to any polytopial complex. [sent-413, score-0.478]

89 This modiﬁcation of the deﬁnition of calibration does not change the other part of our algorithms nor the remaining of the proofs (in particular, to calibrate the sequence of unobserved ﬂags, the forecaster must use γn -perturbations). [sent-414, score-0.332]

90 Assume that instead of choosing jn at stage n ∈ N, which generates the ﬂag fn = s( jn ) and an outcome vector ρ(i, jn ) , Nature chooses directly an outcome vector On ∈ [−1, 1]I and a ﬂag fn i∈I which belongs to s(On ) where s is a multivalued mapping from [−1, 1]I into ∆(S )I . [sent-423, score-1.819]

91 Theorem 27 If the graph of s is a polytope, then there exists an internally consistent strategy σ such that, for every strategy τ of Nature, with Pσ,τ probability at least 1 − δ: max l∈L |Nn (l)| Rn (l) ≤ O n 1 n1/3 ln 1 1 1 + 2/3 ln δ δ n . [sent-426, score-0.654]

92 ¯ We deﬁned a calibrated strategy with respect to Y , as a strategy σ such that jn (l) is asymptotically closer to y(l) than to any other y(k) as soon as the frequency of l does not go to zero. [sent-460, score-0.918]

93 In fact, ¯ jn (l) needs only to be closer to y(l) than to any of its neighbors. [sent-461, score-0.499]

94 So one can construct neighbors calibrated strategies by modifying the algorithm given in Proposition 5; the payoff at stage n is now ′ denoted by Un and is deﬁned by: ′ Un lk = jn − y(l) 2− jn − y(k) 0 2 if l = ln and k is a neighbor of l . [sent-462, score-1.705]

95 otherwise ¯′ + The strategy consisting in choosing an invariant measure of (Un ) is calibrated and the squared 2 equals 4N , where N is the maximal 2 = sup maximal second order moment Mn m≤n Eσ,τ Um number of neighbors. [sent-463, score-0.331]

96 A correspondence B : K ⇉ Rd is polytopial constant, if there exists {K(l); l ∈ L } a ﬁnite polytopial complex of K and {x(l); l ∈ L } such that x(l) ∈ B( f ) for every f ∈ K(l). [sent-503, score-0.589]

97 In the compact case, Proposition 20 becomes: Proposition 33 If s has a polytopial graph, then BR is polytopial constant. [sent-506, score-0.528]

98 By point v), the two mapping y(·) and y′ (·) are afﬁne on K(l), so each possible set f ∈ K(l); xAy( f ) ≥ x′ Ay′ ( f ) is a polytope as the intersection of an half-space and the polytope K(l). [sent-536, score-0.311]

99 Since, the intersection of a union of polytopes remains a union of polytopes, for every x ∈ X, B−1 (x) is a ﬁnite union of polytopes and B is polytopial constant. [sent-537, score-0.453]

100 Theorem 34 (with D = ∆(I )) implies that the solution, denoted by B( f ) for every f ∈ F , of the parameterized program max min ρ(x, y) x∈∆(I ) y∈s−1 ( f ) is polytopial constant. [sent-541, score-0.298]

similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('jn', 0.499), ('polytopial', 0.264), ('forecaster', 0.248), ('nn', 0.22), ('un', 0.181), ('maker', 0.18), ('calibrated', 0.177), ('payoff', 0.162), ('laguerre', 0.149), ('polytope', 0.141), ('alibration', 0.132), ('ln', 0.124), ('regret', 0.119), ('stage', 0.116), ('erchet', 0.116), ('nternal', 0.116), ('onitoring', 0.116), ('action', 0.11), ('strategy', 0.11), ('internally', 0.103), ('monitoring', 0.101), ('ag', 0.1), ('egret', 0.098), ('foster', 0.091), ('lk', 0.091), ('vohra', 0.091), ('ptimal', 0.088), ('br', 0.087), ('diagram', 0.086), ('calibration', 0.084), ('solan', 0.083), ('response', 0.08), ('lehrer', 0.077), ('approachability', 0.074), ('fn', 0.074), ('vorono', 0.07), ('externally', 0.07), ('rd', 0.07), ('game', 0.068), ('internal', 0.066), ('ct', 0.065), ('polytopes', 0.063), ('ags', 0.063), ('xay', 0.063), ('blackwell', 0.059), ('perchet', 0.058), ('stages', 0.058), ('mp', 0.053), ('lugosi', 0.052), ('nature', 0.05), ('ctk', 0.05), ('rik', 0.05), ('consistent', 0.049), ('lgorithms', 0.048), ('actions', 0.048), ('bt', 0.046), ('decision', 0.046), ('sup', 0.044), ('payoffs', 0.044), ('external', 0.044), ('sites', 0.043), ('predictor', 0.041), ('sn', 0.04), ('partial', 0.04), ('bi', 0.039), ('proposition', 0.038), ('ay', 0.038), ('strategies', 0.037), ('kn', 0.036), ('freedman', 0.035), ('en', 0.035), ('mw', 0.035), ('rj', 0.035), ('every', 0.034), ('jm', 0.034), ('accordingly', 0.033), ('multivalued', 0.033), ('bx', 0.032), ('intersection', 0.029), ('ql', 0.029), ('hyperplanes', 0.029), ('pf', 0.029), ('cell', 0.029), ('euclidian', 0.028), ('rn', 0.028), ('complex', 0.027), ('chooses', 0.025), ('games', 0.025), ('na', 0.025), ('approachable', 0.025), ('fudenberg', 0.025), ('ncs', 0.025), ('sorin', 0.025), ('law', 0.023), ('polynomial', 0.022), ('martingale', 0.022), ('frequency', 0.022), ('vertex', 0.021), ('existence', 0.021), ('constants', 0.021)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 1.0 45 jmlr-2011-Internal Regret with Partial Monitoring: Calibration-Based Optimal Algorithms

Author: Vianney Perchet

2 0.15451786 72 jmlr-2011-On the Relation between Realizable and Nonrealizable Cases of the Sequence Prediction Problem

Author: Daniil Ryabko

Abstract: A sequence x1 , . . . , xn , . . . of discrete-valued observations is generated according to some unknown probabilistic law (measure) µ. After observing each outcome, one is required to give conditional probabilities of the next observation. The realizable case is when the measure µ belongs to an arbitrary but known class C of process measures. The non-realizable case is when µ is completely arbitrary, but the prediction performance is measured with respect to a given set C of process measures. We are interested in the relations between these problems and between their solutions, as well as in characterizing the cases when a solution exists and ﬁnding these solutions. We show that if the quality of prediction is measured using the total variation distance, then these problems coincide, while if it is measured using the expected average KL divergence, then they are different. For some of the formalizations we also show that when a solution exists it can be obtained as a Bayes mixture over a countable subset of C . We also obtain several characterization of those sets C for which solutions to the considered problems exist. As an illustration to the general results obtained, we show that a solution to the non-realizable case of the sequence prediction problem exists for the set of all ﬁnite-memory processes, but does not exist for the set of all stationary processes. It should be emphasized that the framework is completely general: the processes measures considered are not required to be i.i.d., mixing, stationary, or to belong to any parametric family. Keywords: sequence prediction, time series, online prediction, realizable sequence prediction, non-realizable sequence prediction

3 0.1176575 86 jmlr-2011-Sparse Linear Identifiable Multivariate Modeling

Author: Ricardo Henao, Ole Winther

Abstract: In this paper we consider sparse and identiﬁable linear latent variable (factor) and linear Bayesian network models for parsimonious analysis of multivariate data. We propose a computationally efﬁcient method for joint parameter and model inference, and model comparison. It consists of a fully Bayesian hierarchy for sparse models using slab and spike priors (two-component δ-function and continuous mixtures), non-Gaussian latent factors and a stochastic search over the ordering of the variables. The framework, which we call SLIM (Sparse Linear Identiﬁable Multivariate modeling), is validated and bench-marked on artiﬁcial and real biological data sets. SLIM is closest in spirit to LiNGAM (Shimizu et al., 2006), but differs substantially in inference, Bayesian network structure learning and model comparison. Experimentally, SLIM performs equally well or better than LiNGAM with comparable computational complexity. We attribute this mainly to the stochastic search strategy used, and to parsimony (sparsity and identiﬁability), which is an explicit part of the model. We propose two extensions to the basic i.i.d. linear framework: non-linear dependence on observed variables, called SNIM (Sparse Non-linear Identiﬁable Multivariate modeling) and allowing for correlations between latent variables, called CSLIM (Correlated SLIM), for the temporal and/or spatial data. The source code and scripts are available from http://cogsys.imm.dtu.dk/slim/. Keywords: parsimony, sparsity, identiﬁability, factor models, linear Bayesian networks

4 0.11495207 104 jmlr-2011-X-Armed Bandits

Author: Sébastien Bubeck, Rémi Munos, Gilles Stoltz, Csaba Szepesvári

Abstract: We consider a generalization of stochastic bandits where the set of arms, X , is allowed to be a generic measurable space and the mean-payoff function is “locally Lipschitz” with respect to a dissimilarity function that is known to the decision maker. Under this condition we construct an arm selection policy, called HOO (hierarchical optimistic optimization), with improved regret bounds compared to previous results for a large class of problems. In particular, our results imply that if X is the unit hypercube in a Euclidean space and the mean-payoff function has a ﬁnite number of global maxima around which the behavior of the function is locally continuous with a known √ smoothness degree, then the expected regret of HOO is bounded up to a logarithmic factor by n, that is, the rate of growth of the regret is independent of the dimension of the space. We also prove the minimax optimality of our algorithm when the dissimilarity is a metric. Our basic strategy has quadratic computational complexity as a function of the number of time steps and does not rely on the doubling trick. We also introduce a modiﬁed strategy, which relies on the doubling trick but runs in linearithmic time. Both results are improvements with respect to previous approaches. Keywords: bandits with inﬁnitely many arms, optimistic online optimization, regret bounds, minimax rates

5 0.094506584 18 jmlr-2011-Convergence Rates of Efficient Global Optimization Algorithms

Author: Adam D. Bull

Abstract: In the efﬁcient global optimization problem, we minimize an unknown function f , using as few observations f (x) as possible. It can be considered a continuum-armed-bandit problem, with noiseless data, and simple regret. Expected-improvement algorithms are perhaps the most popular methods for solving the problem; in this paper, we provide theoretical results on their asymptotic behaviour. Implementing these algorithms requires a choice of Gaussian-process prior, which determines an associated space of functions, its reproducing-kernel Hilbert space (RKHS). When the prior is ﬁxed, expected improvement is known to converge on the minimum of any function in its RKHS. We provide convergence rates for this procedure, optimal for functions of low smoothness, and describe a modiﬁed algorithm attaining optimal rates for smoother functions. In practice, however, priors are typically estimated sequentially from the data. For standard estimators, we show this procedure may never ﬁnd the minimum of f . We then propose alternative estimators, chosen to minimize the constants in the rate of convergence, and show these estimators retain the convergence rates of a ﬁxed prior. Keywords: convergence rates, efﬁcient global optimization, expected improvement, continuumarmed bandit, Bayesian optimization

6 0.055351015 24 jmlr-2011-Dirichlet Process Mixtures of Generalized Linear Models

7 0.053503793 73 jmlr-2011-Online Learning in Case of Unbounded Losses Using Follow the Perturbed Leader Algorithm

8 0.052957926 91 jmlr-2011-The Sample Complexity of Dictionary Learning

9 0.050372284 43 jmlr-2011-Information, Divergence and Risk for Binary Experiments

10 0.045657393 14 jmlr-2011-Better Algorithms for Benign Bandits

11 0.04063772 97 jmlr-2011-Union Support Recovery in Multi-task Learning

12 0.039605401 29 jmlr-2011-Efficient Learning with Partially Observed Attributes

13 0.039351366 44 jmlr-2011-Information Rates of Nonparametric Gaussian Process Methods

14 0.038959716 74 jmlr-2011-Operator Norm Convergence of Spectral Clustering on Level Sets

15 0.03853951 39 jmlr-2011-High-dimensional Covariance Estimation Based On Gaussian Graphical Models

16 0.03315777 59 jmlr-2011-Learning with Structured Sparsity

17 0.032282244 19 jmlr-2011-Convergence of Distributed Asynchronous Learning Vector Quantization Algorithms

18 0.031804979 68 jmlr-2011-Natural Language Processing (Almost) from Scratch

19 0.031093635 9 jmlr-2011-An Asymptotic Behaviour of the Marginal Likelihood for General Markov Models

20 0.030707506 13 jmlr-2011-Bayesian Generalized Kernel Mixed Models

similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.184), (1, 0.032), (2, -0.051), (3, 0.114), (4, 0.085), (5, 0.099), (6, 0.045), (7, 0.063), (8, -0.196), (9, 0.086), (10, -0.003), (11, -0.202), (12, 0.031), (13, 0.291), (14, 0.127), (15, -0.285), (16, -0.084), (17, 0.019), (18, 0.031), (19, 0.018), (20, 0.195), (21, 0.207), (22, -0.067), (23, -0.102), (24, -0.081), (25, 0.11), (26, 0.01), (27, 0.087), (28, -0.132), (29, 0.089), (30, 0.055), (31, 0.006), (32, 0.091), (33, 0.009), (34, -0.054), (35, -0.022), (36, -0.022), (37, -0.079), (38, 0.148), (39, 0.001), (40, -0.0), (41, -0.024), (42, 0.078), (43, -0.026), (44, 0.012), (45, 0.061), (46, 0.104), (47, 0.021), (48, -0.012), (49, -0.014)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.96173519 45 jmlr-2011-Internal Regret with Partial Monitoring: Calibration-Based Optimal Algorithms

Author: Vianney Perchet

2 0.67982465 72 jmlr-2011-On the Relation between Realizable and Nonrealizable Cases of the Sequence Prediction Problem

Author: Daniil Ryabko

3 0.43527886 104 jmlr-2011-X-Armed Bandits

Author: Sébastien Bubeck, Rémi Munos, Gilles Stoltz, Csaba Szepesvári

4 0.3825784 86 jmlr-2011-Sparse Linear Identifiable Multivariate Modeling

Author: Ricardo Henao, Ole Winther

5 0.35741186 18 jmlr-2011-Convergence Rates of Efficient Global Optimization Algorithms

Author: Adam D. Bull

6 0.29031917 43 jmlr-2011-Information, Divergence and Risk for Binary Experiments

7 0.24937461 73 jmlr-2011-Online Learning in Case of Unbounded Losses Using Follow the Perturbed Leader Algorithm

8 0.23542924 29 jmlr-2011-Efficient Learning with Partially Observed Attributes

9 0.1985648 13 jmlr-2011-Bayesian Generalized Kernel Mixed Models

10 0.19583672 91 jmlr-2011-The Sample Complexity of Dictionary Learning

11 0.18812294 24 jmlr-2011-Dirichlet Process Mixtures of Generalized Linear Models

12 0.18160169 39 jmlr-2011-High-dimensional Covariance Estimation Based On Gaussian Graphical Models

13 0.17706683 19 jmlr-2011-Convergence of Distributed Asynchronous Learning Vector Quantization Algorithms

14 0.17653929 9 jmlr-2011-An Asymptotic Behaviour of the Marginal Likelihood for General Markov Models

15 0.16239017 15 jmlr-2011-CARP: Software for Fishing Out Good Clustering Algorithms

16 0.16220278 44 jmlr-2011-Information Rates of Nonparametric Gaussian Process Methods

17 0.16119863 68 jmlr-2011-Natural Language Processing (Almost) from Scratch

18 0.15928029 74 jmlr-2011-Operator Norm Convergence of Spectral Clustering on Level Sets

19 0.15851437 21 jmlr-2011-Cumulative Distribution Networks and the Derivative-sum-product Algorithm: Models and Inference for Cumulative Distribution Functions on Graphs

20 0.14350927 98 jmlr-2011-Universality, Characteristic Kernels and RKHS Embedding of Measures

similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(4, 0.047), (9, 0.013), (10, 0.021), (24, 0.026), (31, 0.044), (32, 0.017), (41, 0.626), (73, 0.017), (78, 0.055), (90, 0.012)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.91304195 45 jmlr-2011-Internal Regret with Partial Monitoring: Calibration-Based Optimal Algorithms

Author: Vianney Perchet

2 0.90837073 23 jmlr-2011-DirectLiNGAM: A Direct Method for Learning a Linear Non-Gaussian Structural Equation Model

Author: Shohei Shimizu, Takanori Inazumi, Yasuhiro Sogawa, Aapo Hyvärinen, Yoshinobu Kawahara, Takashi Washio, Patrik O. Hoyer, Kenneth Bollen

Abstract: Structural equation models and Bayesian networks have been widely used to analyze causal relations between continuous variables. In such frameworks, linear acyclic models are typically used to model the data-generating process of variables. Recently, it was shown that use of non-Gaussianity identiﬁes the full structure of a linear acyclic model, that is, a causal ordering of variables and their connection strengths, without using any prior knowledge on the network structure, which is not the case with conventional methods. However, existing estimation methods are based on iterative search algorithms and may not converge to a correct solution in a ﬁnite number of steps. In this paper, we propose a new direct method to estimate a causal ordering and connection strengths based on non-Gaussianity. In contrast to the previous methods, our algorithm requires no algorithmic parameters and is guaranteed to converge to the right solution within a small ﬁxed number of steps if the data strictly follows the model, that is, if all the model assumptions are met and the sample size is inﬁnite. Keywords: structural equation models, Bayesian networks, independent component analysis, non-Gaussianity, causal discovery c 2011 Shohei Shimizu, Takanori Inazumi, Yasuhiro Sogawa, Aapo Hyv¨ rinen, Yoshinobu Kawahara, Takashi Washio, Patrik O. Hoyer a and Kenneth Bollen ¨ S HIMIZU , I NAZUMI , S OGAWA , H YV ARINEN , K AWAHARA , WASHIO , H OYER AND B OLLEN

3 0.79047716 56 jmlr-2011-Learning Transformation Models for Ranking and Survival Analysis

Author: Vanya Van Belle, Kristiaan Pelckmans, Johan A. K. Suykens, Sabine Van Huffel

Abstract: This paper studies the task of learning transformation models for ranking problems, ordinal regression and survival analysis. The present contribution describes a machine learning approach termed MINLIP . The key insight is to relate ranking criteria as the Area Under the Curve to monotone transformation functions. Consequently, the notion of a Lipschitz smoothness constant is found to be useful for complexity control for learning transformation models, much in a similar vein as the ’margin’ is for Support Vector Machines for classiﬁcation. The use of this model structure in the context of high dimensional data, as well as for estimating non-linear, and additive models based on primal-dual kernel machines, and for sparse models is indicated. Given n observations, the present method solves a quadratic program existing of O (n) constraints and O (n) unknowns, where most existing risk minimization approaches to ranking problems typically result in algorithms with O (n2 ) constraints or unknowns. We specify the MINLIP method for three different cases: the ﬁrst one concerns the preference learning problem. Secondly it is speciﬁed how to adapt the method to ordinal regression with a ﬁnite set of ordered outcomes. Finally, it is shown how the method can be used in the context of survival analysis where one models failure times, typically subject to censoring. The current approach is found to be particularly useful in this context as it can handle, in contrast with the standard statistical model for analyzing survival data, all types of censoring in a straightforward way, and because of the explicit relation with the Proportional Hazard and Accelerated Failure Time models. The advantage of the current method is illustrated on different benchmark data sets, as well as for estimating a model for cancer survival based on different micro-array and clinical data sets. Keywords: support vector machines, preference learning, ranking models, ordinal regression, survival analysis c

4 0.3205229 86 jmlr-2011-Sparse Linear Identifiable Multivariate Modeling

Author: Ricardo Henao, Ole Winther

5 0.30350477 76 jmlr-2011-Parameter Screening and Optimisation for ILP using Designed Experiments

Author: Ashwin Srinivasan, Ganesh Ramakrishnan

Abstract: Reports of experiments conducted with an Inductive Logic Programming system rarely describe how speciﬁc values of parameters of the system are arrived at when constructing models. Usually, no attempt is made to identify sensitive parameters, and those that are used are often given “factory-supplied” default values, or values obtained from some non-systematic exploratory analysis. The immediate consequence of this is, of course, that it is not clear if better models could have been obtained if some form of parameter selection and optimisation had been performed. Questions follow inevitably on the experiments themselves: speciﬁcally, are all algorithms being treated fairly, and is the exploratory phase sufﬁciently well-deﬁned to allow the experiments to be replicated? In this paper, we investigate the use of parameter selection and optimisation techniques grouped under the study of experimental design. Screening and response surface methods determine, in turn, sensitive parameters and good values for these parameters. Screening is done here by constructing a stepwise regression model relating the utility of an ILP system’s hypothesis to its input parameters, using systematic combinations of values of input parameters (technically speaking, we use a two-level fractional factorial design of the input parameters). The parameters used by the regression model are taken to be the sensitive parameters for the system for that application. We then seek an assignment of values to these sensitive parameters that maximise the utility of the ILP model. This is done using the technique of constructing a local “response surface”. The parameters are then changed following the path of steepest ascent until a locally optimal value is reached. This combined use of parameter selection and response surface-driven optimisation has a long history of application in industrial engineering, and its role in ILP is demonstrated using well-known benchmarks. The results suggest that computational

6 0.29880369 74 jmlr-2011-Operator Norm Convergence of Spectral Clustering on Level Sets

7 0.29175916 72 jmlr-2011-On the Relation between Realizable and Nonrealizable Cases of the Sequence Prediction Problem

8 0.28908455 21 jmlr-2011-Cumulative Distribution Networks and the Derivative-sum-product Algorithm: Models and Inference for Cumulative Distribution Functions on Graphs

9 0.28158262 81 jmlr-2011-Robust Approximate Bilinear Programming for Value Function Approximation

10 0.27122211 73 jmlr-2011-Online Learning in Case of Unbounded Losses Using Follow the Perturbed Leader Algorithm

11 0.2704863 104 jmlr-2011-X-Armed Bandits

12 0.2678102 29 jmlr-2011-Efficient Learning with Partially Observed Attributes

13 0.26750124 9 jmlr-2011-An Asymptotic Behaviour of the Marginal Likelihood for General Markov Models

14 0.26351139 38 jmlr-2011-Hierarchical Knowledge Gradient for Sequential Sampling

15 0.26153901 48 jmlr-2011-Kernel Analysis of Deep Networks

16 0.2612434 40 jmlr-2011-Hyper-Sparse Optimal Aggregation

17 0.25801769 91 jmlr-2011-The Sample Complexity of Dictionary Learning

18 0.25505644 1 jmlr-2011-A Bayesian Approach for Learning and Planning in Partially Observable Markov Decision Processes

19 0.25416651 19 jmlr-2011-Convergence of Distributed Asynchronous Learning Vector Quantization Algorithms

20 0.2540518 53 jmlr-2011-Learning High-Dimensional Markov Forest Distributions: Analysis of Error Rates