jmlr jmlr2011 jmlr2011-2 knowledge-graph by maker-knowledge-mining

2 jmlr-2011-A Bayesian Approximation Method for Online Ranking

Source: pdf

Author: Ruby C. Weng, Chih-Jen Lin

Abstract: This paper describes a Bayesian approximation method to obtain online ranking algorithms for games with multiple teams and multiple players. Recently for Internet games large online ranking systems are much needed. We consider game models in which a k-team game is treated as several two-team games. By approximating the expectation of teams’ (or players’) performances, we derive simple analytic update rules. These update rules, without numerical integrations, are very easy to interpret and implement. Experiments on game data show that the accuracy of our approach is competitive with state of the art systems such as TrueSkill, but the running time as well as the code is much shorter. Keywords: Bayesian inference, rating system, Bradley-Terry model, Thurstone-Mosteller model, Plackett-Luce model

Reference: text

Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 TW Department of Computer Science National Taiwan University Taipei 106, Taiwan Editor: Thore Graepel Abstract This paper describes a Bayesian approximation method to obtain online ranking algorithms for games with multiple teams and multiple players. [sent-7, score-0.376]

2 We consider game models in which a k-team game is treated as several two-team games. [sent-9, score-0.338]

3 These online algorithms are especially useful when the number of teams to be ranked and the number of games are very large. [sent-15, score-0.362]

4 To begin, prior to a rating period, a player’s skill (θ) is assumed to follow a Gaussian distribution which can be characterized by two numbers: the average skill of the player (µ) and the degree of uncertainty in the player’s skill (σ). [sent-20, score-0.538]

5 Then, Glicko models the game outcomes by the Bradley-Terry model (Bradley and Terry, 1952) and updates players’ skills after a rating period. [sent-21, score-0.308]

6 Glickman (1999) also reported that the Glicko system performs best when the number of games per player is around 5-10 in a rating period. [sent-22, score-0.224]

7 In video games a game often involves more than two players or teams. [sent-24, score-0.362]

8 First, it is designed for multi-team/multi-player games, and it updates skills after each game rather than a rating period. [sent-30, score-0.28]

9 However, for games with multiple teams and multiple players, the update rules are not possible to write down as they require an iterative procedure. [sent-35, score-0.406]

10 The present paper concerns the ranking of players from outcomes of multiple players or games. [sent-36, score-0.288]

11 We consider a k-team game as a single match and discuss the possibility of obtaining efﬁcient update algorithms. [sent-37, score-0.228]

12 We introduce a Bayesian approximation method to derive simple analytic rules for updating team strength in multi-team games. [sent-38, score-0.394]

13 Strength of players in a team are then updated by assuming that a team’s skill is the sum of skills of ts members. [sent-40, score-0.565]

14 1 Modeling Ranked Data Given the game outcome of k teams, we deﬁne r(i) as the rank of team i. [sent-56, score-0.497]

15 , id are tied together, we have r(i1 ) = · · · = r(id ), and let the team q ranked next have r(q) = r(i1 ) + d. [sent-60, score-0.328]

16 For example, if four teams participate in a game, their ranks may be r(1) = 2, r(2) = 2, r(3) = 4, r(4) = 1, (1) where teams 1 and 2 are both ranked the second. [sent-61, score-0.504]

17 Then team 3, which ranked the next, has r(3) = 4. [sent-62, score-0.314]

18 Suppose that each team is associated with a continuous but unobserved random variable Xi , representing the actual performance. [sent-79, score-0.27]

19 The observed ordering that team r(1) comes in ﬁrst, team r(2) ¯ ¯ comes in second and so on is then determined by the Xi ’s: Xr(1) > Xr(2) > · · · > Xr(k) . [sent-80, score-0.54]

20 In particular, if k = 2 and Xi follows N(θi , β2 ), ¯ ¯ i where θi is the strength of team i and β2 is the uncertainty of the actual performance Xi , then i   θi − θq  , (5) P(Xi > Xq ) = Φ  β2 + β2 q i where Φ denotes the cumulative distribution function of a standard normal density. [sent-86, score-0.36]

21 Recent attempts to extend this off-line approach to multiple players and multiple teams include Huang et al. [sent-91, score-0.338]

22 The Elo system is an online rating scheme which models the probability of game output as (5) with βi = βq and, after each game, updates the strength θi by θi ← θi + K(s − P(i wins)), (6) where K is some constant, and s = 1 if i wins and 0 otherwise. [sent-94, score-0.332]

23 The Bradley-Terry model for paired comparisons has the form P(Xi > Xq ) = vi , vi + vq (7) where vi > 0 is the strength of team i. [sent-98, score-0.37]

24 about the same skill levels, the logistic model gives a weak team i a higher winning chance than the Gaussian model does. [sent-118, score-0.482]

25 Though this approach can handle multiple players per team, it aims to handle only two teams per game. [sent-121, score-0.338]

26 For comparisons involving k ≥ 3 teams per game, the Bradley-Terry model has been generalized in various ways. [sent-122, score-0.236]

27 (21) In particular, if h(z) = zi , then by (14) it follows Uh(z) = ei (a function from Rk to Rk ); and if h(z) = zi z j and i < j, then Uh(z) = zi e j , where {e1 , · · · , ek } denote the standard basis for Rk . [sent-230, score-0.534]

28 Method In this section, we ﬁrst present our proposed method for updating team and individual skills. [sent-240, score-0.286]

29 1 Approximating the Expectations Let θi be the strength parameter of team i whose ability is to be estimated. [sent-243, score-0.314]

30 To start, suppose that team i has a strength parameter θi and assume that the prior distribution of θi is N(µi , σ2 ). [sent-247, score-0.314]

31 The posterior density of Z given the game outcome D is P(z|D) = Cφk (z) f (z), where f (z) is the probability of game outcome P(D|z). [sent-259, score-0.466]

32 Next, we shall update the skill as the posterior mean and variance of θ. [sent-262, score-0.256]

33 Hence, f is increasing in θ1 , and the adjustment to team 1’s skill is the average of the relative rate of change of team 1’s winning probability with respect to its strength θ1 . [sent-268, score-0.763]

34 On the other hand, a larger θ2 is less likely to result in this outcome; hence, f is 275 W ENG AND L IN decreasing in θ2 and the adjustment to team 2’s skill will be negative. [sent-269, score-0.418]

35 We propose approximating expectations in (25) and (26) to obtain the update rules: µi ← µi + Ωi , σ2 i ← (27) σ2 max(1 − ∆i , κ), i (28) where Ωi = σ i ∂ f (z)/∂zi f (z) (29) z=0 and ∂ f (z)/∂zi ∂2 f (z)/∂2 zi + f (z) f (z) z=0 ∂ ∂ f (z)/∂zi =− . [sent-271, score-0.263]

36 The expectation on the right side of (22) involves the following calculation: = m ∂ log ∏m fq (z) ∂ log fq (z) q=1 =∑ ∂zi ∂zi q=1 = ∂ f /∂zi f ∂ fq /∂zi . [sent-303, score-0.309]

37 One may address this problem by modeling a k-team game outcome as (k − 1) two-team games (between all teams on neighboring ranks); that is, k−1 f (z) = ∏ P(outcome between teams ranked ith and (i + 1)st). [sent-308, score-0.777]

38 (39) i=1 Alternatively, we may consider the game result of k teams as k(k − 1)/2 two-team games. [sent-309, score-0.391]

39 Then k f (z) = ∏ k ∏ P(outcome between team i and team q). [sent-310, score-0.54]

40 Suppose that the ith team has ni players, the jth player in the ith team has strength θi j , and the prior distribution of θi j is N(µi j , σ2j ). [sent-317, score-0.686]

41 Equations (43) and (44) say that Ωi , the mean skill change of team i, is partitioned i to ni parts with the magnitude proportional to σ2j . [sent-327, score-0.441]

42 (45) As in (27), we could update µi j by µi j ← µi j + σi j ∂ f¯(¯ )/∂zi j z f¯ , (46) ¯ z=0 where f¯(¯ ) is the probability of game outcomes and z ¯ z = [z11 , . [sent-330, score-0.242]

43 f z=0 Following the deﬁnition of Ωi in (29) we obtain the update rule (43), which says that within team i the adjustment to µi j is proportional to σ2j . [sent-345, score-0.342]

44 (7)-(8), the difference Xi − Xq between two teams follows a logistic distribution. [sent-350, score-0.24]

45 The resulting model is P(team i beats q) ≡ fiq (z) = eθi /ciq , eθi /ciq + eθq /ciq (48) where c2 = β2 + β2 and θi = µi + σi zi . [sent-353, score-0.466]

46 That is, P(i draws with q) = (P(i beats q)P(q beats i))1/2 = (51) fiq (z) fqi (z). [sent-360, score-0.526]

47 (52) With (38) and (51), ∂ f /∂zi f = (53) ∂ fiq /∂zi 1 ∂ fqi /∂zi ∂ fiq /∂zi ∂ fqi /∂zi + ∑ + + ∑ fqi fiq 2 q:r(q)=r(i),q=i fqi fiq q:r(q)>r(i) q:r(q) 0. [sent-367, score-1.576]

48 , k, q = i, eµi /ciq , eµi /ciq + eµq /ciq  1 if r(q) > r(i),  σi 2 piq pqi , where s = 1/2 if r(q) = r(i), ˆ ˆ ηq = γq  ciq  0 if r(q) < r(i). [sent-386, score-0.777]

49 ciq = (σ2 + σ2 + 2β2 )1/2 , q i δq = σ2 i (s − piq ), ˆ ciq 3. [sent-387, score-1.436]

50 Using (24) and (48), it is easy to calculate that ∂ fqi ∂θi −σi −eθi /ciq eθq /ciq · = = fiq fqi θi /ciq + eθq /ciq )2 ∂z ∂zi ciq ciq (e i and (54) ∂ fiq (eθi /ciq + eθq /ciq )eθi /ciq − eθi /ciq eθi /ciq σi · σi = = fiq fqi . [sent-397, score-2.579]

51 Since piq + pqi = 1, (56) can be rewritten as ˆ ˆ  1 if r(q) > r(i),  2 σi (s − piq ), ˆ where s = 1 if r(q) = r(i), Ωi = ∑ 2  q:q=i ciq 0 if r(q) < r(i). [sent-399, score-0.841]

52 piq ≡ ˆ (57) (58) To apply (26) and (30) for updating σi , we use (53) to obtain ∂ ∂zi ∂ f /∂zi f = + From (54), and similarly ∂ ∂zi ∂ fqi /∂zi fqi ∂ ∂zi q:r(q) r(i) ∑ ∂ fqi /∂zi fqi + ∂ ∂zi ∂ fiq /∂zi fiq ∂ fiq /∂zi fiq (59) . [sent-400, score-1.656]

53 ∂(− fiq /ciq ) σ2 = − 2i fiq fqi ∂zi ciq ∂ fiq /∂zi fiq =− σ2 i fiq fqi . [sent-401, score-2.122]

54 Hence we introduce an additional parameter γq so that the update i rule is σ2 ← σ2 max 1 − i i where ξq = ∑ γq ξq , κ , q:q=i σ2 i piq pqi ˆ ˆ c2 iq is from (60) and γq ≤ 1 is decided by users; further discussions on the choice of γq are in Section 6. [sent-406, score-0.339]

55 The Elo treats θi as nonrandom and its update rule is in (6): θi ← θi + K(s − p∗ ), iq where K is a constant (e. [sent-409, score-0.248]

56 , K = 32 in the USCF system for amateur players) and p∗ = iq 10θi /400 10θi /400 + 10θq /400 is the approximate probability that i beats q; see Equations. [sent-411, score-0.261]

57 As for Glicko, it is ˆ iq a Bayesian system but designed for paired comparisons over a rating period. [sent-414, score-0.293]

58 , k be indices of teams ranked from the ﬁrst to the last ¯ For a = 1, . [sent-420, score-0.266]

59 Update Rules Using Other Ranking Models If we assume different distributions of the team performance Xi or model the game results by other ways than the Bradley-Terry model, the same framework in Sections 3. [sent-438, score-0.453]

60 With the deﬁnition of r in (2), the function f (z) ¯ can be written as k−1 f (z) = ∏ f¯r(a)¯(a+1) (z), ¯ r (63) a=1 where we deﬁne f¯r(a)¯(a+1) (z) as follows: ¯ r i ≡ r(a), ¯ f¯iq = q ≡ r(a + 1), ¯ fiq fiq fqi if r(i) < r(q), if r(i) = r(q). [sent-444, score-0.61]

61 (64) Note that fiq and fqi are deﬁned in (48) of Section 3. [sent-445, score-0.394]

62 The reason is that ∂ f (z)/∂zi is only related to game outcomes between r(a) and ¯ teams of adjacent ranks, r(a − 1) and r(a + 1). [sent-451, score-0.405]

63 2 Thurstone-Mosteller Model (Full-pair and Partial-pair) In this section, we consider the Thurstone-Mosteller model by assuming that the actual performance of team i is Xi ∼ N(θi , β2 ), i where β2 = σ2 + β2 as in Section 3. [sent-456, score-0.284]

64 If one considers partial pairs q iq iq i P(team i beats team q) = P(Xi > Xq ) = Φ θi − θq ciq and uses (51) to obtain P(i draws with q), then a derivation similar to that for the Bradley-Terry model leads to certain update rules. [sent-459, score-1.481]

65 (2007) to let ε be the draw margin that depends on the game mode and assume that the probabilities that i beats q and a draw occurs are respectively P(team i beats team q) = P(Xi > Xq + ε) = Φ θi − θq − ε ciq and P(team i draws with q) = P(|Xi − Xq | < ε) ε − (θi − θq ) −ε − (θi − θq ) −Φ . [sent-461, score-1.257]

66 =Φ ciq ciq (65) We can then obtain f (z) using the full-pair setting (40). [sent-462, score-1.372]

67 , two teams), then the update rules (if i beats q) in Algorithm 3 are reduced to µi ←µi + σ2 i V ciq µi − µq ε , ciq ciq , µq ←µq − σ2 q V ciq µi − µq ε , ciq ciq , where the function V is deﬁned in (67). [sent-468, score-4.281]

68 If ties are not allowed, an extension of the Plackett-Luce model (9) incorporating variance parameters is k k q=1 q=1 f (z) = ∏ fq (z) = ∏ eθq /c ∑s∈Cq eθs /c , (70) where θi − µi ,c = zi = σi k 1/2 ∑ (σ2 + β2 ) i i=1 and Cq = {i : r(i) ≥ r(q)}. [sent-486, score-0.33]

69 Note that fq (z) corresponds to the probability that team q is the winner among teams in Cq . [sent-489, score-0.595]

70 Assume that, during the rating period, the player plays n j games against opponent j, where j = 1, . [sent-517, score-0.235]

71 Let s jk be the outcome of the kth game j j against opponent j, with s jk = 1 if the player wins, s jk = 0. [sent-521, score-0.381]

72 5 if the game results in a tie, and s jk = 0 if the player loses. [sent-522, score-0.261]

73 In fact, the Glicko system treats a collection of games within a “rating period” to have simultaneous occurrences, and it works best when the number of games in a rating period is moderate, say an average of 5-10 games per player in a rating period. [sent-652, score-0.463]

74 3 The set contains data from four different game types: • Free for All: up to 8 players in a game. [sent-658, score-0.285]

75 (2007) indicate that for “Small Teams,” each team has no more than 4 players, and for “Large Teams,” each has no more than 8. [sent-675, score-0.27]

76 We check only team pairs whose ranks are different. [sent-720, score-0.286]

77 For example, if there are three teams A, B, and C and the rank of one game is (1, 1, 2), then only the two pairs (A,C) and (B,C) count. [sent-721, score-0.406]

78 Further, if before the game we have µA = µC and the game output shows rank(A) < rank(C), it is considered a wrong prediction. [sent-722, score-0.338]

79 The prediction error rate is the fraction of total team pairs (from the second to the last game) that are wrongly predicted. [sent-726, score-0.27]

80 The second column indicates the number of total team pairs used for the evaluation. [sent-763, score-0.27]

81 We conduct a further comparison using only team pairs which are more difﬁcult for prediction. [sent-790, score-0.27]

82 For “Free for All,” the team pairs whose ranks in a game are closer can be viewed as difﬁcult cases 292 A BAYESIAN A PPROXIMATION M ETHOD FOR O NLINE R ANKING for prediction. [sent-791, score-0.455]

83 After a team (or player) has played many games, the obtained ability becomes more accurate. [sent-794, score-0.29]

84 To check the performance when teams have only played few games, we select games where the average number of players’ past appearances is small. [sent-795, score-0.319]

85 Clearly if players in a game have only played few games, the prediction is more difﬁcult. [sent-797, score-0.305]

86 Note that this model presumes that a team’s actual performance depends on the q i team it competes with. [sent-839, score-0.284]

87 Then, ∂ fqi /∂zi ∂ f /∂zi = + ∑ f fqi q:q∈Q,r(q) r(i) ∑ . [sent-849, score-0.356]

88 Next, ∂ ∂zi ∂ f /∂zi f = ∂ ∂zi q:q∈Q,r(q) r(i) ∑ ∂ fqi /∂zi fqi + ∂ ∂zi ∂ fiq /∂zi fiq . [sent-850, score-0.788]

89 Derivations of Update Rules for the Thurstone-Mosteller Model Deﬁne fiq (z) ≡ P(team i beats team q) = Φ θi − θq − ε ciq and f¯iq (z) ≡ P(team i draws with team q) −ε − (θi − θq ) ε − (θi − θq ) −Φ , =Φ ciq ciq where θi = σi zi + µi . [sent-857, score-3.066]

90 Then   fiq (z) if r(i) > r(q),  P(outcome of team i and q) = fqi (z) if r(i) < r(q),  ¯ fiq (z) if r(i) = r(q). [sent-858, score-0.88]

91 Similar to the derivation for the Bradley-Terry model in (52) and (53), ∂ fqi /∂zi ∂ fiq /∂zi ∂ f¯iq /∂zi ∂ f /∂zi . [sent-859, score-0.408]

92 = ∑ + ∑ + ∑ f fqi fiq f¯iq q:r(q) r(i) q:r(q)=r(i) Using the relation between φ and Φ in (66), θi − θq − ε ∂ Φ ∂θi ciq Therefore, =φ θ −θ −ε θi − θq − ε ciq i q ∂ fiq /∂zi 1 φ( ciq ) ∂θi σi = = V · fiq ciq Φ( θi −θq −ε ) ∂zi ciq ciq 296 1 . [sent-860, score-4.942]

93 ciq θi − θq ε , ciq ciq (92) , (93) A BAYESIAN A PPROXIMATION M ETHOD FOR O NLINE R ANKING where the function V is deﬁned in (67). [sent-861, score-2.058]

94 Similarly, ∂ fqi −σi = V ∂zi ciq θq − θi ε , ciq ciq . [sent-862, score-2.236]

95 Then the update rule is µi ←µi + σi ←µi + σ2 i + ∂ f (z)/∂zi f z=0 −1 ∑ ciq V q:r(q) r(i) iq ∑ µi − µq ε , ciq ciq . [sent-864, score-2.306]

96 ∂ fqi /∂zi fqi + µi − µq ε , ciq ciq To update σ, similar to (59), we have ∂ ∂zi ∂ f /∂zi f = ∂ ∂zi q:r(q) r(i) ∂ f¯iq /∂zi f¯iq ∑ ∂ fiq /∂zi fiq . [sent-865, score-2.219]

97 Similarly, ∂ ∂ fqi /∂zi σ2 =− iW ∂zi fqi ciq 297 θq − θi ε , ciq ciq . [sent-867, score-2.414]

98 Combining (95), (96), and (97), the update rule for σ2 is i σ2 ←σ2 1 − i i µq − µi ε µi − µq ε σ2 σ2 i i W( W( , )+ ∑ , ) 2 2 ciq ciq ciq ciq q:r(q) r(i) ciq ∑ σ2 ˜ µi − µq ε i W( , ) ciq ciq c2 q:r(q)=r(i),q=i iq ∑ . [sent-870, score-5.05]

99 298 ∂θi ∂zi (98) A BAYESIAN A PPROXIMATION M ETHOD FOR O NLINE R ANKING From (38), the update rule is µi ← µi + Ωi , where k Ωi =σi ∑ q=1 = σ2 i c ∂ fq (z)/∂zi fq (z) 1 Ai 1− z=0 eµi /c ∑s∈Ci eµs /c + ∑ q:q=i,r(q)≤r(i) − 1 eµi /c Aq ∑s∈Cq eµs /c . [sent-873, score-0.265]

100 To update σ, similar to (59), we must calculate ∂ ∂zi ∂ fq /∂zi fq , ∀q. [sent-874, score-0.29]

similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('ciq', 0.686), ('team', 0.27), ('teams', 0.222), ('fiq', 0.216), ('iq', 0.189), ('zi', 0.178), ('fqi', 0.178), ('game', 0.169), ('trueskill', 0.165), ('skill', 0.135), ('glicko', 0.133), ('players', 0.116), ('cq', 0.112), ('fq', 0.103), ('eng', 0.083), ('games', 0.077), ('ethod', 0.073), ('nline', 0.073), ('pproximation', 0.073), ('anking', 0.067), ('rating', 0.067), ('player', 0.066), ('piq', 0.064), ('xq', 0.062), ('uh', 0.059), ('update', 0.059), ('beats', 0.058), ('glickman', 0.057), ('elo', 0.054), ('xr', 0.053), ('head', 0.049), ('rules', 0.048), ('herbrich', 0.047), ('woodroofe', 0.044), ('strength', 0.044), ('skills', 0.044), ('ranked', 0.044), ('outcome', 0.043), ('ranking', 0.042), ('posterior', 0.042), ('pl', 0.039), ('rk', 0.039), ('ni', 0.036), ('stein', 0.033), ('weng', 0.032), ('ruby', 0.032), ('aq', 0.031), ('winning', 0.031), ('bayesian', 0.028), ('pqi', 0.027), ('jk', 0.026), ('opponent', 0.025), ('opponents', 0.025), ('cumulative', 0.025), ('calculate', 0.025), ('free', 0.024), ('dw', 0.024), ('gi', 0.024), ('paired', 0.023), ('normal', 0.021), ('nj', 0.021), ('variance', 0.02), ('played', 0.02), ('online', 0.019), ('wins', 0.019), ('likelihood', 0.019), ('caq', 0.019), ('kuh', 0.019), ('thurstone', 0.019), ('uscf', 0.019), ('vq', 0.019), ('logistic', 0.018), ('period', 0.018), ('dz', 0.018), ('marginal', 0.018), ('taiwan', 0.016), ('thore', 0.016), ('draws', 0.016), ('integrals', 0.016), ('ranks', 0.016), ('approximation', 0.016), ('updating', 0.016), ('propagation', 0.015), ('integration', 0.015), ('ties', 0.015), ('messages', 0.015), ('rank', 0.015), ('chess', 0.015), ('calculation', 0.014), ('id', 0.014), ('minka', 0.014), ('model', 0.014), ('outcomes', 0.014), ('system', 0.014), ('integral', 0.014), ('dy', 0.013), ('adjustment', 0.013), ('bradley', 0.013), ('approximating', 0.013), ('expectations', 0.013)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.99999982 2 jmlr-2011-A Bayesian Approximation Method for Online Ranking

Author: Ruby C. Weng, Chih-Jen Lin

2 0.14894925 21 jmlr-2011-Cumulative Distribution Networks and the Derivative-sum-product Algorithm: Models and Inference for Cumulative Distribution Functions on Graphs

Author: Jim C. Huang, Brendan J. Frey

Abstract: We present a class of graphical models for directly representing the joint cumulative distribution function (CDF) of many random variables, called cumulative distribution networks (CDNs). Unlike graphs for probability density and mass functions, for CDFs the marginal probabilities for any subset of variables are obtained by computing limits of functions in the model, and conditional probabilities correspond to computing mixed derivatives. We will show that the conditional independence properties in a CDN are distinct from the conditional independence properties of directed, undirected and factor graphs, but include the conditional independence properties of bi-directed graphs. In order to perform inference in such models, we describe the ‘derivative-sum-product’ (DSP) message-passing algorithm in which messages correspond to derivatives of the joint CDF. We will then apply CDNs to the problem of learning to rank players in multiplayer team-based games and suggest several future directions for research. Keywords: graphical models, cumulative distribution function, message-passing algorithm, inference

3 0.047997911 11 jmlr-2011-Approximate Marginals in Latent Gaussian Models

Author: Botond Cseke, Tom Heskes

Abstract: We consider the problem of improving the Gaussian approximate posterior marginals computed by expectation propagation and the Laplace method in latent Gaussian models and propose methods that are similar in spirit to the Laplace approximation of Tierney and Kadane (1986). We show that in the case of sparse Gaussian models, the computational complexity of expectation propagation can be made comparable to that of the Laplace method by using a parallel updating scheme. In some cases, expectation propagation gives excellent estimates where the Laplace approximation fails. Inspired by bounds on the correct marginals, we arrive at factorized approximations, which can be applied on top of both expectation propagation and the Laplace method. The factorized approximations can give nearly indistinguishable results from the non-factorized approximations and their computational complexity scales linearly with the number of variables. We experienced that the expectation propagation based marginal approximations we introduce are typically more accurate than the methods of similar complexity proposed by Rue et al. (2009). Keywords: approximate marginals, Gaussian Markov random ﬁelds, Laplace approximation, variational inference, expectation propagation

4 0.036317065 73 jmlr-2011-Online Learning in Case of Unbounded Losses Using Follow the Perturbed Leader Algorithm

Author: Vladimir V. V'yugin

Abstract: In this paper the sequential prediction problem with expert advice is considered for the case where losses of experts suffered at each step cannot be bounded in advance. We present some modiﬁcation of Kalai and Vempala algorithm of following the perturbed leader where weights depend on past losses of the experts. New notions of a volume and a scaled ﬂuctuation of a game are introduced. We present a probabilistic algorithm protected from unrestrictedly large one-step losses. This algorithm has the optimal performance in the case when the scaled ﬂuctuations of one-step losses of experts of the pool tend to zero. Keywords: prediction with expert advice, follow the perturbed leader, unbounded losses, adaptive learning rate, expected bounds, Hannan consistency, online sequential prediction

5 0.034030113 30 jmlr-2011-Efficient Structure Learning of Bayesian Networks using Constraints

Author: Cassio P. de Campos, Qiang Ji

Abstract: This paper addresses the problem of learning Bayesian network structures from data based on score functions that are decomposable. It describes properties that strongly reduce the time and memory costs of many known methods without losing global optimality guarantees. These properties are derived for different score criteria such as Minimum Description Length (or Bayesian Information Criterion), Akaike Information Criterion and Bayesian Dirichlet Criterion. Then a branch-andbound algorithm is presented that integrates structural constraints with data in a way to guarantee global optimality. As an example, structural constraints are used to map the problem of structure learning in Dynamic Bayesian networks into a corresponding augmented Bayesian network. Finally, we show empirically the beneﬁts of using the properties with state-of-the-art methods and with the new algorithm, which is able to handle larger data sets than before. Keywords: Bayesian networks, structure learning, properties of decomposable scores, structural constraints, branch-and-bound technique

6 0.030837383 100 jmlr-2011-Unsupervised Supervised Learning II: Margin-Based Classification Without Labels

7 0.02844801 82 jmlr-2011-Robust Gaussian Process Regression with a Student-tLikelihood

8 0.025552606 56 jmlr-2011-Learning Transformation Models for Ranking and Survival Analysis

9 0.025178201 24 jmlr-2011-Dirichlet Process Mixtures of Generalized Linear Models

10 0.02239274 101 jmlr-2011-Variable Sparsity Kernel Learning

11 0.021777062 78 jmlr-2011-Producing Power-Law Distributions and Damping Word Frequencies with Two-Stage Language Models

12 0.021703467 90 jmlr-2011-The Indian Buffet Process: An Introduction and Review

13 0.021689061 25 jmlr-2011-Discriminative Learning of Bayesian Networks via Factorized Conditional Log-Likelihood

14 0.021656074 71 jmlr-2011-On Equivalence Relationships Between Classification and Ranking Algorithms

15 0.021476736 18 jmlr-2011-Convergence Rates of Efficient Global Optimization Algorithms

16 0.020803561 57 jmlr-2011-Learning a Robust Relevance Model for Search Using Kernel Methods

17 0.019887242 45 jmlr-2011-Internal Regret with Partial Monitoring: Calibration-Based Optimal Algorithms

18 0.01924221 46 jmlr-2011-Introduction to the Special Topic on Grammar Induction, Representation of Language and Language Learning

19 0.016106466 28 jmlr-2011-Double Updating Online Learning

20 0.015950384 77 jmlr-2011-Posterior Sparsity in Unsupervised Dependency Parsing

similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.099), (1, -0.02), (2, -0.071), (3, -0.002), (4, 0.041), (5, -0.007), (6, 0.001), (7, 0.051), (8, 0.058), (9, 0.044), (10, -0.022), (11, -0.075), (12, 0.129), (13, 0.024), (14, 0.112), (15, 0.219), (16, -0.068), (17, 0.221), (18, 0.361), (19, -0.213), (20, 0.167), (21, 0.095), (22, 0.115), (23, 0.099), (24, -0.148), (25, -0.23), (26, 0.012), (27, -0.103), (28, 0.042), (29, 0.037), (30, 0.086), (31, -0.045), (32, 0.038), (33, -0.02), (34, 0.047), (35, 0.067), (36, -0.091), (37, 0.119), (38, -0.021), (39, -0.001), (40, -0.108), (41, 0.023), (42, -0.019), (43, 0.016), (44, 0.033), (45, 0.086), (46, 0.101), (47, 0.03), (48, 0.054), (49, -0.027)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.96140194 2 jmlr-2011-A Bayesian Approximation Method for Online Ranking

Author: Ruby C. Weng, Chih-Jen Lin

2 0.79322368 21 jmlr-2011-Cumulative Distribution Networks and the Derivative-sum-product Algorithm: Models and Inference for Cumulative Distribution Functions on Graphs

Author: Jim C. Huang, Brendan J. Frey

3 0.23397605 73 jmlr-2011-Online Learning in Case of Unbounded Losses Using Follow the Perturbed Leader Algorithm

Author: Vladimir V. V'yugin

4 0.16835706 11 jmlr-2011-Approximate Marginals in Latent Gaussian Models

Author: Botond Cseke, Tom Heskes

5 0.16017988 100 jmlr-2011-Unsupervised Supervised Learning II: Margin-Based Classification Without Labels

Author: Krishnakumar Balasubramanian, Pinar Donmez, Guy Lebanon

Abstract: Many popular linear classiﬁers, such as logistic regression, boosting, or SVM, are trained by optimizing a margin-based risk function. Traditionally, these risk functions are computed based on a labeled data set. We develop a novel technique for estimating such risks using only unlabeled data and the marginal label distribution. We prove that the proposed risk estimator is consistent on high-dimensional data sets and demonstrate it on synthetic and real-world data. In particular, we show how the estimate is used for evaluating classiﬁers in transfer learning, and for training classiﬁers with no labeled data whatsoever. Keywords: classiﬁcation, large margin, maximum likelihood

6 0.14469238 93 jmlr-2011-The arules R-Package Ecosystem: Analyzing Interesting Patterns from Large Transaction Data Sets

7 0.14291362 56 jmlr-2011-Learning Transformation Models for Ranking and Survival Analysis

8 0.12862939 24 jmlr-2011-Dirichlet Process Mixtures of Generalized Linear Models

9 0.12737373 82 jmlr-2011-Robust Gaussian Process Regression with a Student-tLikelihood

10 0.12397212 28 jmlr-2011-Double Updating Online Learning

11 0.12125953 25 jmlr-2011-Discriminative Learning of Bayesian Networks via Factorized Conditional Log-Likelihood

12 0.11349671 38 jmlr-2011-Hierarchical Knowledge Gradient for Sequential Sampling

13 0.10776185 71 jmlr-2011-On Equivalence Relationships Between Classification and Ranking Algorithms

14 0.10722657 78 jmlr-2011-Producing Power-Law Distributions and Damping Word Frequencies with Two-Stage Language Models

15 0.10591274 90 jmlr-2011-The Indian Buffet Process: An Introduction and Review

16 0.10142631 61 jmlr-2011-Logistic Stick-Breaking Process

17 0.096183382 30 jmlr-2011-Efficient Structure Learning of Bayesian Networks using Constraints

18 0.092232205 86 jmlr-2011-Sparse Linear Identifiable Multivariate Modeling

19 0.088087641 49 jmlr-2011-Kernel Regression in the Presence of Correlated Errors

20 0.086497329 46 jmlr-2011-Introduction to the Special Topic on Grammar Induction, Representation of Language and Language Learning

similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(0, 0.49), (4, 0.029), (9, 0.014), (10, 0.028), (24, 0.042), (31, 0.075), (32, 0.021), (41, 0.032), (71, 0.013), (73, 0.045), (78, 0.048), (88, 0.014), (90, 0.014)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.74321336 2 jmlr-2011-A Bayesian Approximation Method for Online Ranking

Author: Ruby C. Weng, Chih-Jen Lin

2 0.32374614 21 jmlr-2011-Cumulative Distribution Networks and the Derivative-sum-product Algorithm: Models and Inference for Cumulative Distribution Functions on Graphs

Author: Jim C. Huang, Brendan J. Frey

3 0.22194646 86 jmlr-2011-Sparse Linear Identifiable Multivariate Modeling

Author: Ricardo Henao, Ole Winther

Abstract: In this paper we consider sparse and identiﬁable linear latent variable (factor) and linear Bayesian network models for parsimonious analysis of multivariate data. We propose a computationally efﬁcient method for joint parameter and model inference, and model comparison. It consists of a fully Bayesian hierarchy for sparse models using slab and spike priors (two-component δ-function and continuous mixtures), non-Gaussian latent factors and a stochastic search over the ordering of the variables. The framework, which we call SLIM (Sparse Linear Identiﬁable Multivariate modeling), is validated and bench-marked on artiﬁcial and real biological data sets. SLIM is closest in spirit to LiNGAM (Shimizu et al., 2006), but differs substantially in inference, Bayesian network structure learning and model comparison. Experimentally, SLIM performs equally well or better than LiNGAM with comparable computational complexity. We attribute this mainly to the stochastic search strategy used, and to parsimony (sparsity and identiﬁability), which is an explicit part of the model. We propose two extensions to the basic i.i.d. linear framework: non-linear dependence on observed variables, called SNIM (Sparse Non-linear Identiﬁable Multivariate modeling) and allowing for correlations between latent variables, called CSLIM (Correlated SLIM), for the temporal and/or spatial data. The source code and scripts are available from http://cogsys.imm.dtu.dk/slim/. Keywords: parsimony, sparsity, identiﬁability, factor models, linear Bayesian networks

4 0.22186358 4 jmlr-2011-A Family of Simple Non-Parametric Kernel Learning Algorithms

Author: Jinfeng Zhuang, Ivor W. Tsang, Steven C.H. Hoi

Abstract: Previous studies of Non-Parametric Kernel Learning (NPKL) usually formulate the learning task as a Semi-Deﬁnite Programming (SDP) problem that is often solved by some general purpose SDP solvers. However, for N data examples, the time complexity of NPKL using a standard interiorpoint SDP solver could be as high as O(N 6.5 ), which prohibits NPKL methods applicable to real applications, even for data sets of moderate size. In this paper, we present a family of efﬁcient NPKL algorithms, termed “SimpleNPKL”, which can learn non-parametric kernels from a large set of pairwise constraints efﬁciently. In particular, we propose two efﬁcient SimpleNPKL algorithms. One is SimpleNPKL algorithm with linear loss, which enjoys a closed-form solution that can be efﬁciently computed by the Lanczos sparse eigen decomposition technique. Another one is SimpleNPKL algorithm with other loss functions (including square hinge loss, hinge loss, square loss) that can be re-formulated as a saddle-point optimization problem, which can be further resolved by a fast iterative algorithm. In contrast to the previous NPKL approaches, our empirical results show that the proposed new technique, maintaining the same accuracy, is signiﬁcantly more efﬁcient and scalable. Finally, we also demonstrate that the proposed new technique is also applicable to speed up many kernel learning tasks, including colored maximum variance unfolding, minimum volume embedding, and structure preserving embedding. Keywords: non-parametric kernel learning, semi-deﬁnite programming, semi-supervised learning, side information, pairwise constraints

5 0.22014447 77 jmlr-2011-Posterior Sparsity in Unsupervised Dependency Parsing

Author: Jennifer Gillenwater, Kuzman Ganchev, João Graça, Fernando Pereira, Ben Taskar

Abstract: A strong inductive bias is essential in unsupervised grammar induction. In this paper, we explore a particular sparsity bias in dependency grammars that encourages a small number of unique dependency types. We use part-of-speech (POS) tags to group dependencies by parent-child types and investigate sparsity-inducing penalties on the posterior distributions of parent-child POS tag pairs in the posterior regularization (PR) framework of Graça et al. (2007). In experiments with 12 different languages, we achieve signiﬁcant gains in directed attachment accuracy over the standard expectation maximization (EM) baseline, with an average accuracy improvement of 6.5%, outperforming EM by at least 1% for 9 out of 12 languages. Furthermore, the new method outperforms models based on standard Bayesian sparsity-inducing parameter priors with an average improvement of 5% and positive gains of at least 1% for 9 out of 12 languages. On English text in particular, we show that our approach improves performance over other state-of-the-art techniques.

6 0.21965903 17 jmlr-2011-Computationally Efficient Convolved Multiple Output Gaussian Processes

7 0.21951874 12 jmlr-2011-Bayesian Co-Training

8 0.21805404 74 jmlr-2011-Operator Norm Convergence of Spectral Clustering on Level Sets

9 0.21671766 96 jmlr-2011-Two Distributed-State Models For Generating High-Dimensional Time Series

10 0.21575619 48 jmlr-2011-Kernel Analysis of Deep Networks

11 0.21517178 16 jmlr-2011-Clustering Algorithms for Chains

12 0.21415968 13 jmlr-2011-Bayesian Generalized Kernel Mixed Models

13 0.21353282 91 jmlr-2011-The Sample Complexity of Dictionary Learning

14 0.21317865 67 jmlr-2011-Multitask Sparsity via Maximum Entropy Discrimination

15 0.21235114 53 jmlr-2011-Learning High-Dimensional Markov Forest Distributions: Analysis of Error Rates

16 0.21176788 64 jmlr-2011-Minimum Description Length Penalization for Group and Multi-Task Sparse Learning

17 0.21139725 66 jmlr-2011-Multiple Kernel Learning Algorithms

18 0.21110214 9 jmlr-2011-An Asymptotic Behaviour of the Marginal Likelihood for General Markov Models

19 0.21059407 29 jmlr-2011-Efficient Learning with Partially Observed Attributes

20 0.2086553 84 jmlr-2011-Semi-Supervised Learning with Measure Propagation