jmlr jmlr2011 jmlr2011-2 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Ruby C. Weng, Chih-Jen Lin
Abstract: This paper describes a Bayesian approximation method to obtain online ranking algorithms for games with multiple teams and multiple players. Recently for Internet games large online ranking systems are much needed. We consider game models in which a k-team game is treated as several two-team games. By approximating the expectation of teams’ (or players’) performances, we derive simple analytic update rules. These update rules, without numerical integrations, are very easy to interpret and implement. Experiments on game data show that the accuracy of our approach is competitive with state of the art systems such as TrueSkill, but the running time as well as the code is much shorter. Keywords: Bayesian inference, rating system, Bradley-Terry model, Thurstone-Mosteller model, Plackett-Luce model
Reference: text
sentIndex sentText sentNum sentScore
1 TW Department of Computer Science National Taiwan University Taipei 106, Taiwan Editor: Thore Graepel Abstract This paper describes a Bayesian approximation method to obtain online ranking algorithms for games with multiple teams and multiple players. [sent-7, score-0.376]
2 We consider game models in which a k-team game is treated as several two-team games. [sent-9, score-0.338]
3 These online algorithms are especially useful when the number of teams to be ranked and the number of games are very large. [sent-15, score-0.362]
4 To begin, prior to a rating period, a player’s skill (θ) is assumed to follow a Gaussian distribution which can be characterized by two numbers: the average skill of the player (µ) and the degree of uncertainty in the player’s skill (σ). [sent-20, score-0.538]
5 Then, Glicko models the game outcomes by the Bradley-Terry model (Bradley and Terry, 1952) and updates players’ skills after a rating period. [sent-21, score-0.308]
6 Glickman (1999) also reported that the Glicko system performs best when the number of games per player is around 5-10 in a rating period. [sent-22, score-0.224]
7 In video games a game often involves more than two players or teams. [sent-24, score-0.362]
8 First, it is designed for multi-team/multi-player games, and it updates skills after each game rather than a rating period. [sent-30, score-0.28]
9 However, for games with multiple teams and multiple players, the update rules are not possible to write down as they require an iterative procedure. [sent-35, score-0.406]
10 The present paper concerns the ranking of players from outcomes of multiple players or games. [sent-36, score-0.288]
11 We consider a k-team game as a single match and discuss the possibility of obtaining efficient update algorithms. [sent-37, score-0.228]
12 We introduce a Bayesian approximation method to derive simple analytic rules for updating team strength in multi-team games. [sent-38, score-0.394]
13 Strength of players in a team are then updated by assuming that a team’s skill is the sum of skills of ts members. [sent-40, score-0.565]
14 1 Modeling Ranked Data Given the game outcome of k teams, we define r(i) as the rank of team i. [sent-56, score-0.497]
15 , id are tied together, we have r(i1 ) = · · · = r(id ), and let the team q ranked next have r(q) = r(i1 ) + d. [sent-60, score-0.328]
16 For example, if four teams participate in a game, their ranks may be r(1) = 2, r(2) = 2, r(3) = 4, r(4) = 1, (1) where teams 1 and 2 are both ranked the second. [sent-61, score-0.504]
17 Then team 3, which ranked the next, has r(3) = 4. [sent-62, score-0.314]
18 Suppose that each team is associated with a continuous but unobserved random variable Xi , representing the actual performance. [sent-79, score-0.27]
19 The observed ordering that team r(1) comes in first, team r(2) ¯ ¯ comes in second and so on is then determined by the Xi ’s: Xr(1) > Xr(2) > · · · > Xr(k) . [sent-80, score-0.54]
20 In particular, if k = 2 and Xi follows N(θi , β2 ), ¯ ¯ i where θi is the strength of team i and β2 is the uncertainty of the actual performance Xi , then i θi − θq , (5) P(Xi > Xq ) = Φ β2 + β2 q i where Φ denotes the cumulative distribution function of a standard normal density. [sent-86, score-0.36]
21 Recent attempts to extend this off-line approach to multiple players and multiple teams include Huang et al. [sent-91, score-0.338]
22 The Elo system is an online rating scheme which models the probability of game output as (5) with βi = βq and, after each game, updates the strength θi by θi ← θi + K(s − P(i wins)), (6) where K is some constant, and s = 1 if i wins and 0 otherwise. [sent-94, score-0.332]
23 The Bradley-Terry model for paired comparisons has the form P(Xi > Xq ) = vi , vi + vq (7) where vi > 0 is the strength of team i. [sent-98, score-0.37]
24 about the same skill levels, the logistic model gives a weak team i a higher winning chance than the Gaussian model does. [sent-118, score-0.482]
25 Though this approach can handle multiple players per team, it aims to handle only two teams per game. [sent-121, score-0.338]
26 For comparisons involving k ≥ 3 teams per game, the Bradley-Terry model has been generalized in various ways. [sent-122, score-0.236]
27 (21) In particular, if h(z) = zi , then by (14) it follows Uh(z) = ei (a function from Rk to Rk ); and if h(z) = zi z j and i < j, then Uh(z) = zi e j , where {e1 , · · · , ek } denote the standard basis for Rk . [sent-230, score-0.534]
28 Method In this section, we first present our proposed method for updating team and individual skills. [sent-240, score-0.286]
29 1 Approximating the Expectations Let θi be the strength parameter of team i whose ability is to be estimated. [sent-243, score-0.314]
30 To start, suppose that team i has a strength parameter θi and assume that the prior distribution of θi is N(µi , σ2 ). [sent-247, score-0.314]
31 The posterior density of Z given the game outcome D is P(z|D) = Cφk (z) f (z), where f (z) is the probability of game outcome P(D|z). [sent-259, score-0.466]
32 Next, we shall update the skill as the posterior mean and variance of θ. [sent-262, score-0.256]
33 Hence, f is increasing in θ1 , and the adjustment to team 1’s skill is the average of the relative rate of change of team 1’s winning probability with respect to its strength θ1 . [sent-268, score-0.763]
34 On the other hand, a larger θ2 is less likely to result in this outcome; hence, f is 275 W ENG AND L IN decreasing in θ2 and the adjustment to team 2’s skill will be negative. [sent-269, score-0.418]
35 We propose approximating expectations in (25) and (26) to obtain the update rules: µi ← µi + Ωi , σ2 i ← (27) σ2 max(1 − ∆i , κ), i (28) where Ωi = σ i ∂ f (z)/∂zi f (z) (29) z=0 and ∂ f (z)/∂zi ∂2 f (z)/∂2 zi + f (z) f (z) z=0 ∂ ∂ f (z)/∂zi =− . [sent-271, score-0.263]
36 The expectation on the right side of (22) involves the following calculation: = m ∂ log ∏m fq (z) ∂ log fq (z) q=1 =∑ ∂zi ∂zi q=1 = ∂ f /∂zi f ∂ fq /∂zi . [sent-303, score-0.309]
37 One may address this problem by modeling a k-team game outcome as (k − 1) two-team games (between all teams on neighboring ranks); that is, k−1 f (z) = ∏ P(outcome between teams ranked ith and (i + 1)st). [sent-308, score-0.777]
38 (39) i=1 Alternatively, we may consider the game result of k teams as k(k − 1)/2 two-team games. [sent-309, score-0.391]
39 Then k f (z) = ∏ k ∏ P(outcome between team i and team q). [sent-310, score-0.54]
40 Suppose that the ith team has ni players, the jth player in the ith team has strength θi j , and the prior distribution of θi j is N(µi j , σ2j ). [sent-317, score-0.686]
41 Equations (43) and (44) say that Ωi , the mean skill change of team i, is partitioned i to ni parts with the magnitude proportional to σ2j . [sent-327, score-0.441]
42 (45) As in (27), we could update µi j by µi j ← µi j + σi j ∂ f¯(¯ )/∂zi j z f¯ , (46) ¯ z=0 where f¯(¯ ) is the probability of game outcomes and z ¯ z = [z11 , . [sent-330, score-0.242]
43 f z=0 Following the definition of Ωi in (29) we obtain the update rule (43), which says that within team i the adjustment to µi j is proportional to σ2j . [sent-345, score-0.342]
44 (7)-(8), the difference Xi − Xq between two teams follows a logistic distribution. [sent-350, score-0.24]
45 The resulting model is P(team i beats q) ≡ fiq (z) = eθi /ciq , eθi /ciq + eθq /ciq (48) where c2 = β2 + β2 and θi = µi + σi zi . [sent-353, score-0.466]
46 That is, P(i draws with q) = (P(i beats q)P(q beats i))1/2 = (51) fiq (z) fqi (z). [sent-360, score-0.526]
47 (52) With (38) and (51), ∂ f /∂zi f = (53) ∂ fiq /∂zi 1 ∂ fqi /∂zi ∂ fiq /∂zi ∂ fqi /∂zi + ∑ + + ∑ fqi fiq 2 q:r(q)=r(i),q=i fqi fiq q:r(q)>r(i) q:r(q) 0. [sent-367, score-1.576]
48 , k, q = i, eµi /ciq , eµi /ciq + eµq /ciq 1 if r(q) > r(i), σi 2 piq pqi , where s = 1/2 if r(q) = r(i), ˆ ˆ ηq = γq ciq 0 if r(q) < r(i). [sent-386, score-0.777]
49 ciq = (σ2 + σ2 + 2β2 )1/2 , q i δq = σ2 i (s − piq ), ˆ ciq 3. [sent-387, score-1.436]
50 Using (24) and (48), it is easy to calculate that ∂ fqi ∂θi −σi −eθi /ciq eθq /ciq · = = fiq fqi θi /ciq + eθq /ciq )2 ∂z ∂zi ciq ciq (e i and (54) ∂ fiq (eθi /ciq + eθq /ciq )eθi /ciq − eθi /ciq eθi /ciq σi · σi = = fiq fqi . [sent-397, score-2.579]
51 Since piq + pqi = 1, (56) can be rewritten as ˆ ˆ 1 if r(q) > r(i), 2 σi (s − piq ), ˆ where s = 1 if r(q) = r(i), Ωi = ∑ 2 q:q=i ciq 0 if r(q) < r(i). [sent-399, score-0.841]
52 piq ≡ ˆ (57) (58) To apply (26) and (30) for updating σi , we use (53) to obtain ∂ ∂zi ∂ f /∂zi f = + From (54), and similarly ∂ ∂zi ∂ fqi /∂zi fqi ∂ ∂zi q:r(q) r(i) ∑ ∂ fqi /∂zi fqi + ∂ ∂zi ∂ fiq /∂zi fiq ∂ fiq /∂zi fiq (59) . [sent-400, score-1.656]
53 ∂(− fiq /ciq ) σ2 = − 2i fiq fqi ∂zi ciq ∂ fiq /∂zi fiq =− σ2 i fiq fqi . [sent-401, score-2.122]
54 Hence we introduce an additional parameter γq so that the update i rule is σ2 ← σ2 max 1 − i i where ξq = ∑ γq ξq , κ , q:q=i σ2 i piq pqi ˆ ˆ c2 iq is from (60) and γq ≤ 1 is decided by users; further discussions on the choice of γq are in Section 6. [sent-406, score-0.339]
55 The Elo treats θi as nonrandom and its update rule is in (6): θi ← θi + K(s − p∗ ), iq where K is a constant (e. [sent-409, score-0.248]
56 , K = 32 in the USCF system for amateur players) and p∗ = iq 10θi /400 10θi /400 + 10θq /400 is the approximate probability that i beats q; see Equations. [sent-411, score-0.261]
57 As for Glicko, it is ˆ iq a Bayesian system but designed for paired comparisons over a rating period. [sent-414, score-0.293]
58 , k be indices of teams ranked from the first to the last ¯ For a = 1, . [sent-420, score-0.266]
59 Update Rules Using Other Ranking Models If we assume different distributions of the team performance Xi or model the game results by other ways than the Bradley-Terry model, the same framework in Sections 3. [sent-438, score-0.453]
60 With the definition of r in (2), the function f (z) ¯ can be written as k−1 f (z) = ∏ f¯r(a)¯(a+1) (z), ¯ r (63) a=1 where we define f¯r(a)¯(a+1) (z) as follows: ¯ r i ≡ r(a), ¯ f¯iq = q ≡ r(a + 1), ¯ fiq fiq fqi if r(i) < r(q), if r(i) = r(q). [sent-444, score-0.61]
61 (64) Note that fiq and fqi are defined in (48) of Section 3. [sent-445, score-0.394]
62 The reason is that ∂ f (z)/∂zi is only related to game outcomes between r(a) and ¯ teams of adjacent ranks, r(a − 1) and r(a + 1). [sent-451, score-0.405]
63 2 Thurstone-Mosteller Model (Full-pair and Partial-pair) In this section, we consider the Thurstone-Mosteller model by assuming that the actual performance of team i is Xi ∼ N(θi , β2 ), i where β2 = σ2 + β2 as in Section 3. [sent-456, score-0.284]
64 If one considers partial pairs q iq iq i P(team i beats team q) = P(Xi > Xq ) = Φ θi − θq ciq and uses (51) to obtain P(i draws with q), then a derivation similar to that for the Bradley-Terry model leads to certain update rules. [sent-459, score-1.481]
65 (2007) to let ε be the draw margin that depends on the game mode and assume that the probabilities that i beats q and a draw occurs are respectively P(team i beats team q) = P(Xi > Xq + ε) = Φ θi − θq − ε ciq and P(team i draws with q) = P(|Xi − Xq | < ε) ε − (θi − θq ) −ε − (θi − θq ) −Φ . [sent-461, score-1.257]
66 =Φ ciq ciq (65) We can then obtain f (z) using the full-pair setting (40). [sent-462, score-1.372]
67 , two teams), then the update rules (if i beats q) in Algorithm 3 are reduced to µi ←µi + σ2 i V ciq µi − µq ε , ciq ciq , µq ←µq − σ2 q V ciq µi − µq ε , ciq ciq , where the function V is defined in (67). [sent-468, score-4.281]
68 If ties are not allowed, an extension of the Plackett-Luce model (9) incorporating variance parameters is k k q=1 q=1 f (z) = ∏ fq (z) = ∏ eθq /c ∑s∈Cq eθs /c , (70) where θi − µi ,c = zi = σi k 1/2 ∑ (σ2 + β2 ) i i=1 and Cq = {i : r(i) ≥ r(q)}. [sent-486, score-0.33]
69 Note that fq (z) corresponds to the probability that team q is the winner among teams in Cq . [sent-489, score-0.595]
70 Assume that, during the rating period, the player plays n j games against opponent j, where j = 1, . [sent-517, score-0.235]
71 Let s jk be the outcome of the kth game j j against opponent j, with s jk = 1 if the player wins, s jk = 0. [sent-521, score-0.381]
72 5 if the game results in a tie, and s jk = 0 if the player loses. [sent-522, score-0.261]
73 In fact, the Glicko system treats a collection of games within a “rating period” to have simultaneous occurrences, and it works best when the number of games in a rating period is moderate, say an average of 5-10 games per player in a rating period. [sent-652, score-0.463]
74 3 The set contains data from four different game types: • Free for All: up to 8 players in a game. [sent-658, score-0.285]
75 (2007) indicate that for “Small Teams,” each team has no more than 4 players, and for “Large Teams,” each has no more than 8. [sent-675, score-0.27]
76 We check only team pairs whose ranks are different. [sent-720, score-0.286]
77 For example, if there are three teams A, B, and C and the rank of one game is (1, 1, 2), then only the two pairs (A,C) and (B,C) count. [sent-721, score-0.406]
78 Further, if before the game we have µA = µC and the game output shows rank(A) < rank(C), it is considered a wrong prediction. [sent-722, score-0.338]
79 The prediction error rate is the fraction of total team pairs (from the second to the last game) that are wrongly predicted. [sent-726, score-0.27]
80 The second column indicates the number of total team pairs used for the evaluation. [sent-763, score-0.27]
81 We conduct a further comparison using only team pairs which are more difficult for prediction. [sent-790, score-0.27]
82 For “Free for All,” the team pairs whose ranks in a game are closer can be viewed as difficult cases 292 A BAYESIAN A PPROXIMATION M ETHOD FOR O NLINE R ANKING for prediction. [sent-791, score-0.455]
83 After a team (or player) has played many games, the obtained ability becomes more accurate. [sent-794, score-0.29]
84 To check the performance when teams have only played few games, we select games where the average number of players’ past appearances is small. [sent-795, score-0.319]
85 Clearly if players in a game have only played few games, the prediction is more difficult. [sent-797, score-0.305]
86 Note that this model presumes that a team’s actual performance depends on the q i team it competes with. [sent-839, score-0.284]
87 Then, ∂ fqi /∂zi ∂ f /∂zi = + ∑ f fqi q:q∈Q,r(q) r(i) ∑ . [sent-849, score-0.356]
88 Next, ∂ ∂zi ∂ f /∂zi f = ∂ ∂zi q:q∈Q,r(q) r(i) ∑ ∂ fqi /∂zi fqi + ∂ ∂zi ∂ fiq /∂zi fiq . [sent-850, score-0.788]
89 Derivations of Update Rules for the Thurstone-Mosteller Model Define fiq (z) ≡ P(team i beats team q) = Φ θi − θq − ε ciq and f¯iq (z) ≡ P(team i draws with team q) −ε − (θi − θq ) ε − (θi − θq ) −Φ , =Φ ciq ciq where θi = σi zi + µi . [sent-857, score-3.066]
90 Then fiq (z) if r(i) > r(q), P(outcome of team i and q) = fqi (z) if r(i) < r(q), ¯ fiq (z) if r(i) = r(q). [sent-858, score-0.88]
91 Similar to the derivation for the Bradley-Terry model in (52) and (53), ∂ fqi /∂zi ∂ fiq /∂zi ∂ f¯iq /∂zi ∂ f /∂zi . [sent-859, score-0.408]
92 = ∑ + ∑ + ∑ f fqi fiq f¯iq q:r(q) r(i) q:r(q)=r(i) Using the relation between φ and Φ in (66), θi − θq − ε ∂ Φ ∂θi ciq Therefore, =φ θ −θ −ε θi − θq − ε ciq i q ∂ fiq /∂zi 1 φ( ciq ) ∂θi σi = = V · fiq ciq Φ( θi −θq −ε ) ∂zi ciq ciq 296 1 . [sent-860, score-4.942]
93 ciq θi − θq ε , ciq ciq (92) , (93) A BAYESIAN A PPROXIMATION M ETHOD FOR O NLINE R ANKING where the function V is defined in (67). [sent-861, score-2.058]
94 Similarly, ∂ fqi −σi = V ∂zi ciq θq − θi ε , ciq ciq . [sent-862, score-2.236]
95 Then the update rule is µi ←µi + σi ←µi + σ2 i + ∂ f (z)/∂zi f z=0 −1 ∑ ciq V q:r(q) r(i) iq ∑ µi − µq ε , ciq ciq . [sent-864, score-2.306]
96 ∂ fqi /∂zi fqi + µi − µq ε , ciq ciq To update σ, similar to (59), we have ∂ ∂zi ∂ f /∂zi f = ∂ ∂zi q:r(q) r(i) ∂ f¯iq /∂zi f¯iq ∑ ∂ fiq /∂zi fiq . [sent-865, score-2.219]
97 Similarly, ∂ ∂ fqi /∂zi σ2 =− iW ∂zi fqi ciq 297 θq − θi ε , ciq ciq . [sent-867, score-2.414]
98 Combining (95), (96), and (97), the update rule for σ2 is i σ2 ←σ2 1 − i i µq − µi ε µi − µq ε σ2 σ2 i i W( W( , )+ ∑ , ) 2 2 ciq ciq ciq ciq q:r(q) r(i) ciq ∑ σ2 ˜ µi − µq ε i W( , ) ciq ciq c2 q:r(q)=r(i),q=i iq ∑ . [sent-870, score-5.05]
99 298 ∂θi ∂zi (98) A BAYESIAN A PPROXIMATION M ETHOD FOR O NLINE R ANKING From (38), the update rule is µi ← µi + Ωi , where k Ωi =σi ∑ q=1 = σ2 i c ∂ fq (z)/∂zi fq (z) 1 Ai 1− z=0 eµi /c ∑s∈Ci eµs /c + ∑ q:q=i,r(q)≤r(i) − 1 eµi /c Aq ∑s∈Cq eµs /c . [sent-873, score-0.265]
100 To update σ, similar to (59), we must calculate ∂ ∂zi ∂ fq /∂zi fq , ∀q. [sent-874, score-0.29]
wordName wordTfidf (topN-words)
[('ciq', 0.686), ('team', 0.27), ('teams', 0.222), ('fiq', 0.216), ('iq', 0.189), ('zi', 0.178), ('fqi', 0.178), ('game', 0.169), ('trueskill', 0.165), ('skill', 0.135), ('glicko', 0.133), ('players', 0.116), ('cq', 0.112), ('fq', 0.103), ('eng', 0.083), ('games', 0.077), ('ethod', 0.073), ('nline', 0.073), ('pproximation', 0.073), ('anking', 0.067), ('rating', 0.067), ('player', 0.066), ('piq', 0.064), ('xq', 0.062), ('uh', 0.059), ('update', 0.059), ('beats', 0.058), ('glickman', 0.057), ('elo', 0.054), ('xr', 0.053), ('head', 0.049), ('rules', 0.048), ('herbrich', 0.047), ('woodroofe', 0.044), ('strength', 0.044), ('skills', 0.044), ('ranked', 0.044), ('outcome', 0.043), ('ranking', 0.042), ('posterior', 0.042), ('pl', 0.039), ('rk', 0.039), ('ni', 0.036), ('stein', 0.033), ('weng', 0.032), ('ruby', 0.032), ('aq', 0.031), ('winning', 0.031), ('bayesian', 0.028), ('pqi', 0.027), ('jk', 0.026), ('opponent', 0.025), ('opponents', 0.025), ('cumulative', 0.025), ('calculate', 0.025), ('free', 0.024), ('dw', 0.024), ('gi', 0.024), ('paired', 0.023), ('normal', 0.021), ('nj', 0.021), ('variance', 0.02), ('played', 0.02), ('online', 0.019), ('wins', 0.019), ('likelihood', 0.019), ('caq', 0.019), ('kuh', 0.019), ('thurstone', 0.019), ('uscf', 0.019), ('vq', 0.019), ('logistic', 0.018), ('period', 0.018), ('dz', 0.018), ('marginal', 0.018), ('taiwan', 0.016), ('thore', 0.016), ('draws', 0.016), ('integrals', 0.016), ('ranks', 0.016), ('approximation', 0.016), ('updating', 0.016), ('propagation', 0.015), ('integration', 0.015), ('ties', 0.015), ('messages', 0.015), ('rank', 0.015), ('chess', 0.015), ('calculation', 0.014), ('id', 0.014), ('minka', 0.014), ('model', 0.014), ('outcomes', 0.014), ('system', 0.014), ('integral', 0.014), ('dy', 0.013), ('adjustment', 0.013), ('bradley', 0.013), ('approximating', 0.013), ('expectations', 0.013)]
simIndex simValue paperId paperTitle
same-paper 1 0.99999982 2 jmlr-2011-A Bayesian Approximation Method for Online Ranking
Author: Ruby C. Weng, Chih-Jen Lin
Abstract: This paper describes a Bayesian approximation method to obtain online ranking algorithms for games with multiple teams and multiple players. Recently for Internet games large online ranking systems are much needed. We consider game models in which a k-team game is treated as several two-team games. By approximating the expectation of teams’ (or players’) performances, we derive simple analytic update rules. These update rules, without numerical integrations, are very easy to interpret and implement. Experiments on game data show that the accuracy of our approach is competitive with state of the art systems such as TrueSkill, but the running time as well as the code is much shorter. Keywords: Bayesian inference, rating system, Bradley-Terry model, Thurstone-Mosteller model, Plackett-Luce model
Author: Jim C. Huang, Brendan J. Frey
Abstract: We present a class of graphical models for directly representing the joint cumulative distribution function (CDF) of many random variables, called cumulative distribution networks (CDNs). Unlike graphs for probability density and mass functions, for CDFs the marginal probabilities for any subset of variables are obtained by computing limits of functions in the model, and conditional probabilities correspond to computing mixed derivatives. We will show that the conditional independence properties in a CDN are distinct from the conditional independence properties of directed, undirected and factor graphs, but include the conditional independence properties of bi-directed graphs. In order to perform inference in such models, we describe the ‘derivative-sum-product’ (DSP) message-passing algorithm in which messages correspond to derivatives of the joint CDF. We will then apply CDNs to the problem of learning to rank players in multiplayer team-based games and suggest several future directions for research. Keywords: graphical models, cumulative distribution function, message-passing algorithm, inference
3 0.047997911 11 jmlr-2011-Approximate Marginals in Latent Gaussian Models
Author: Botond Cseke, Tom Heskes
Abstract: We consider the problem of improving the Gaussian approximate posterior marginals computed by expectation propagation and the Laplace method in latent Gaussian models and propose methods that are similar in spirit to the Laplace approximation of Tierney and Kadane (1986). We show that in the case of sparse Gaussian models, the computational complexity of expectation propagation can be made comparable to that of the Laplace method by using a parallel updating scheme. In some cases, expectation propagation gives excellent estimates where the Laplace approximation fails. Inspired by bounds on the correct marginals, we arrive at factorized approximations, which can be applied on top of both expectation propagation and the Laplace method. The factorized approximations can give nearly indistinguishable results from the non-factorized approximations and their computational complexity scales linearly with the number of variables. We experienced that the expectation propagation based marginal approximations we introduce are typically more accurate than the methods of similar complexity proposed by Rue et al. (2009). Keywords: approximate marginals, Gaussian Markov random fields, Laplace approximation, variational inference, expectation propagation
4 0.036317065 73 jmlr-2011-Online Learning in Case of Unbounded Losses Using Follow the Perturbed Leader Algorithm
Author: Vladimir V. V'yugin
Abstract: In this paper the sequential prediction problem with expert advice is considered for the case where losses of experts suffered at each step cannot be bounded in advance. We present some modification of Kalai and Vempala algorithm of following the perturbed leader where weights depend on past losses of the experts. New notions of a volume and a scaled fluctuation of a game are introduced. We present a probabilistic algorithm protected from unrestrictedly large one-step losses. This algorithm has the optimal performance in the case when the scaled fluctuations of one-step losses of experts of the pool tend to zero. Keywords: prediction with expert advice, follow the perturbed leader, unbounded losses, adaptive learning rate, expected bounds, Hannan consistency, online sequential prediction
5 0.034030113 30 jmlr-2011-Efficient Structure Learning of Bayesian Networks using Constraints
Author: Cassio P. de Campos, Qiang Ji
Abstract: This paper addresses the problem of learning Bayesian network structures from data based on score functions that are decomposable. It describes properties that strongly reduce the time and memory costs of many known methods without losing global optimality guarantees. These properties are derived for different score criteria such as Minimum Description Length (or Bayesian Information Criterion), Akaike Information Criterion and Bayesian Dirichlet Criterion. Then a branch-andbound algorithm is presented that integrates structural constraints with data in a way to guarantee global optimality. As an example, structural constraints are used to map the problem of structure learning in Dynamic Bayesian networks into a corresponding augmented Bayesian network. Finally, we show empirically the benefits of using the properties with state-of-the-art methods and with the new algorithm, which is able to handle larger data sets than before. Keywords: Bayesian networks, structure learning, properties of decomposable scores, structural constraints, branch-and-bound technique
6 0.030837383 100 jmlr-2011-Unsupervised Supervised Learning II: Margin-Based Classification Without Labels
7 0.02844801 82 jmlr-2011-Robust Gaussian Process Regression with a Student-tLikelihood
8 0.025552606 56 jmlr-2011-Learning Transformation Models for Ranking and Survival Analysis
9 0.025178201 24 jmlr-2011-Dirichlet Process Mixtures of Generalized Linear Models
10 0.02239274 101 jmlr-2011-Variable Sparsity Kernel Learning
11 0.021777062 78 jmlr-2011-Producing Power-Law Distributions and Damping Word Frequencies with Two-Stage Language Models
12 0.021703467 90 jmlr-2011-The Indian Buffet Process: An Introduction and Review
13 0.021689061 25 jmlr-2011-Discriminative Learning of Bayesian Networks via Factorized Conditional Log-Likelihood
14 0.021656074 71 jmlr-2011-On Equivalence Relationships Between Classification and Ranking Algorithms
15 0.021476736 18 jmlr-2011-Convergence Rates of Efficient Global Optimization Algorithms
16 0.020803561 57 jmlr-2011-Learning a Robust Relevance Model for Search Using Kernel Methods
17 0.019887242 45 jmlr-2011-Internal Regret with Partial Monitoring: Calibration-Based Optimal Algorithms
18 0.01924221 46 jmlr-2011-Introduction to the Special Topic on Grammar Induction, Representation of Language and Language Learning
19 0.016106466 28 jmlr-2011-Double Updating Online Learning
20 0.015950384 77 jmlr-2011-Posterior Sparsity in Unsupervised Dependency Parsing
topicId topicWeight
[(0, 0.099), (1, -0.02), (2, -0.071), (3, -0.002), (4, 0.041), (5, -0.007), (6, 0.001), (7, 0.051), (8, 0.058), (9, 0.044), (10, -0.022), (11, -0.075), (12, 0.129), (13, 0.024), (14, 0.112), (15, 0.219), (16, -0.068), (17, 0.221), (18, 0.361), (19, -0.213), (20, 0.167), (21, 0.095), (22, 0.115), (23, 0.099), (24, -0.148), (25, -0.23), (26, 0.012), (27, -0.103), (28, 0.042), (29, 0.037), (30, 0.086), (31, -0.045), (32, 0.038), (33, -0.02), (34, 0.047), (35, 0.067), (36, -0.091), (37, 0.119), (38, -0.021), (39, -0.001), (40, -0.108), (41, 0.023), (42, -0.019), (43, 0.016), (44, 0.033), (45, 0.086), (46, 0.101), (47, 0.03), (48, 0.054), (49, -0.027)]
simIndex simValue paperId paperTitle
same-paper 1 0.96140194 2 jmlr-2011-A Bayesian Approximation Method for Online Ranking
Author: Ruby C. Weng, Chih-Jen Lin
Abstract: This paper describes a Bayesian approximation method to obtain online ranking algorithms for games with multiple teams and multiple players. Recently for Internet games large online ranking systems are much needed. We consider game models in which a k-team game is treated as several two-team games. By approximating the expectation of teams’ (or players’) performances, we derive simple analytic update rules. These update rules, without numerical integrations, are very easy to interpret and implement. Experiments on game data show that the accuracy of our approach is competitive with state of the art systems such as TrueSkill, but the running time as well as the code is much shorter. Keywords: Bayesian inference, rating system, Bradley-Terry model, Thurstone-Mosteller model, Plackett-Luce model
Author: Jim C. Huang, Brendan J. Frey
Abstract: We present a class of graphical models for directly representing the joint cumulative distribution function (CDF) of many random variables, called cumulative distribution networks (CDNs). Unlike graphs for probability density and mass functions, for CDFs the marginal probabilities for any subset of variables are obtained by computing limits of functions in the model, and conditional probabilities correspond to computing mixed derivatives. We will show that the conditional independence properties in a CDN are distinct from the conditional independence properties of directed, undirected and factor graphs, but include the conditional independence properties of bi-directed graphs. In order to perform inference in such models, we describe the ‘derivative-sum-product’ (DSP) message-passing algorithm in which messages correspond to derivatives of the joint CDF. We will then apply CDNs to the problem of learning to rank players in multiplayer team-based games and suggest several future directions for research. Keywords: graphical models, cumulative distribution function, message-passing algorithm, inference
3 0.23397605 73 jmlr-2011-Online Learning in Case of Unbounded Losses Using Follow the Perturbed Leader Algorithm
Author: Vladimir V. V'yugin
Abstract: In this paper the sequential prediction problem with expert advice is considered for the case where losses of experts suffered at each step cannot be bounded in advance. We present some modification of Kalai and Vempala algorithm of following the perturbed leader where weights depend on past losses of the experts. New notions of a volume and a scaled fluctuation of a game are introduced. We present a probabilistic algorithm protected from unrestrictedly large one-step losses. This algorithm has the optimal performance in the case when the scaled fluctuations of one-step losses of experts of the pool tend to zero. Keywords: prediction with expert advice, follow the perturbed leader, unbounded losses, adaptive learning rate, expected bounds, Hannan consistency, online sequential prediction
4 0.16835706 11 jmlr-2011-Approximate Marginals in Latent Gaussian Models
Author: Botond Cseke, Tom Heskes
Abstract: We consider the problem of improving the Gaussian approximate posterior marginals computed by expectation propagation and the Laplace method in latent Gaussian models and propose methods that are similar in spirit to the Laplace approximation of Tierney and Kadane (1986). We show that in the case of sparse Gaussian models, the computational complexity of expectation propagation can be made comparable to that of the Laplace method by using a parallel updating scheme. In some cases, expectation propagation gives excellent estimates where the Laplace approximation fails. Inspired by bounds on the correct marginals, we arrive at factorized approximations, which can be applied on top of both expectation propagation and the Laplace method. The factorized approximations can give nearly indistinguishable results from the non-factorized approximations and their computational complexity scales linearly with the number of variables. We experienced that the expectation propagation based marginal approximations we introduce are typically more accurate than the methods of similar complexity proposed by Rue et al. (2009). Keywords: approximate marginals, Gaussian Markov random fields, Laplace approximation, variational inference, expectation propagation
5 0.16017988 100 jmlr-2011-Unsupervised Supervised Learning II: Margin-Based Classification Without Labels
Author: Krishnakumar Balasubramanian, Pinar Donmez, Guy Lebanon
Abstract: Many popular linear classifiers, such as logistic regression, boosting, or SVM, are trained by optimizing a margin-based risk function. Traditionally, these risk functions are computed based on a labeled data set. We develop a novel technique for estimating such risks using only unlabeled data and the marginal label distribution. We prove that the proposed risk estimator is consistent on high-dimensional data sets and demonstrate it on synthetic and real-world data. In particular, we show how the estimate is used for evaluating classifiers in transfer learning, and for training classifiers with no labeled data whatsoever. Keywords: classification, large margin, maximum likelihood
6 0.14469238 93 jmlr-2011-The arules R-Package Ecosystem: Analyzing Interesting Patterns from Large Transaction Data Sets
7 0.14291362 56 jmlr-2011-Learning Transformation Models for Ranking and Survival Analysis
8 0.12862939 24 jmlr-2011-Dirichlet Process Mixtures of Generalized Linear Models
9 0.12737373 82 jmlr-2011-Robust Gaussian Process Regression with a Student-tLikelihood
10 0.12397212 28 jmlr-2011-Double Updating Online Learning
11 0.12125953 25 jmlr-2011-Discriminative Learning of Bayesian Networks via Factorized Conditional Log-Likelihood
12 0.11349671 38 jmlr-2011-Hierarchical Knowledge Gradient for Sequential Sampling
13 0.10776185 71 jmlr-2011-On Equivalence Relationships Between Classification and Ranking Algorithms
14 0.10722657 78 jmlr-2011-Producing Power-Law Distributions and Damping Word Frequencies with Two-Stage Language Models
15 0.10591274 90 jmlr-2011-The Indian Buffet Process: An Introduction and Review
16 0.10142631 61 jmlr-2011-Logistic Stick-Breaking Process
17 0.096183382 30 jmlr-2011-Efficient Structure Learning of Bayesian Networks using Constraints
18 0.092232205 86 jmlr-2011-Sparse Linear Identifiable Multivariate Modeling
19 0.088087641 49 jmlr-2011-Kernel Regression in the Presence of Correlated Errors
20 0.086497329 46 jmlr-2011-Introduction to the Special Topic on Grammar Induction, Representation of Language and Language Learning
topicId topicWeight
[(0, 0.49), (4, 0.029), (9, 0.014), (10, 0.028), (24, 0.042), (31, 0.075), (32, 0.021), (41, 0.032), (71, 0.013), (73, 0.045), (78, 0.048), (88, 0.014), (90, 0.014)]
simIndex simValue paperId paperTitle
same-paper 1 0.74321336 2 jmlr-2011-A Bayesian Approximation Method for Online Ranking
Author: Ruby C. Weng, Chih-Jen Lin
Abstract: This paper describes a Bayesian approximation method to obtain online ranking algorithms for games with multiple teams and multiple players. Recently for Internet games large online ranking systems are much needed. We consider game models in which a k-team game is treated as several two-team games. By approximating the expectation of teams’ (or players’) performances, we derive simple analytic update rules. These update rules, without numerical integrations, are very easy to interpret and implement. Experiments on game data show that the accuracy of our approach is competitive with state of the art systems such as TrueSkill, but the running time as well as the code is much shorter. Keywords: Bayesian inference, rating system, Bradley-Terry model, Thurstone-Mosteller model, Plackett-Luce model
Author: Jim C. Huang, Brendan J. Frey
Abstract: We present a class of graphical models for directly representing the joint cumulative distribution function (CDF) of many random variables, called cumulative distribution networks (CDNs). Unlike graphs for probability density and mass functions, for CDFs the marginal probabilities for any subset of variables are obtained by computing limits of functions in the model, and conditional probabilities correspond to computing mixed derivatives. We will show that the conditional independence properties in a CDN are distinct from the conditional independence properties of directed, undirected and factor graphs, but include the conditional independence properties of bi-directed graphs. In order to perform inference in such models, we describe the ‘derivative-sum-product’ (DSP) message-passing algorithm in which messages correspond to derivatives of the joint CDF. We will then apply CDNs to the problem of learning to rank players in multiplayer team-based games and suggest several future directions for research. Keywords: graphical models, cumulative distribution function, message-passing algorithm, inference
3 0.22194646 86 jmlr-2011-Sparse Linear Identifiable Multivariate Modeling
Author: Ricardo Henao, Ole Winther
Abstract: In this paper we consider sparse and identifiable linear latent variable (factor) and linear Bayesian network models for parsimonious analysis of multivariate data. We propose a computationally efficient method for joint parameter and model inference, and model comparison. It consists of a fully Bayesian hierarchy for sparse models using slab and spike priors (two-component δ-function and continuous mixtures), non-Gaussian latent factors and a stochastic search over the ordering of the variables. The framework, which we call SLIM (Sparse Linear Identifiable Multivariate modeling), is validated and bench-marked on artificial and real biological data sets. SLIM is closest in spirit to LiNGAM (Shimizu et al., 2006), but differs substantially in inference, Bayesian network structure learning and model comparison. Experimentally, SLIM performs equally well or better than LiNGAM with comparable computational complexity. We attribute this mainly to the stochastic search strategy used, and to parsimony (sparsity and identifiability), which is an explicit part of the model. We propose two extensions to the basic i.i.d. linear framework: non-linear dependence on observed variables, called SNIM (Sparse Non-linear Identifiable Multivariate modeling) and allowing for correlations between latent variables, called CSLIM (Correlated SLIM), for the temporal and/or spatial data. The source code and scripts are available from http://cogsys.imm.dtu.dk/slim/. Keywords: parsimony, sparsity, identifiability, factor models, linear Bayesian networks
4 0.22186358 4 jmlr-2011-A Family of Simple Non-Parametric Kernel Learning Algorithms
Author: Jinfeng Zhuang, Ivor W. Tsang, Steven C.H. Hoi
Abstract: Previous studies of Non-Parametric Kernel Learning (NPKL) usually formulate the learning task as a Semi-Definite Programming (SDP) problem that is often solved by some general purpose SDP solvers. However, for N data examples, the time complexity of NPKL using a standard interiorpoint SDP solver could be as high as O(N 6.5 ), which prohibits NPKL methods applicable to real applications, even for data sets of moderate size. In this paper, we present a family of efficient NPKL algorithms, termed “SimpleNPKL”, which can learn non-parametric kernels from a large set of pairwise constraints efficiently. In particular, we propose two efficient SimpleNPKL algorithms. One is SimpleNPKL algorithm with linear loss, which enjoys a closed-form solution that can be efficiently computed by the Lanczos sparse eigen decomposition technique. Another one is SimpleNPKL algorithm with other loss functions (including square hinge loss, hinge loss, square loss) that can be re-formulated as a saddle-point optimization problem, which can be further resolved by a fast iterative algorithm. In contrast to the previous NPKL approaches, our empirical results show that the proposed new technique, maintaining the same accuracy, is significantly more efficient and scalable. Finally, we also demonstrate that the proposed new technique is also applicable to speed up many kernel learning tasks, including colored maximum variance unfolding, minimum volume embedding, and structure preserving embedding. Keywords: non-parametric kernel learning, semi-definite programming, semi-supervised learning, side information, pairwise constraints
5 0.22014447 77 jmlr-2011-Posterior Sparsity in Unsupervised Dependency Parsing
Author: Jennifer Gillenwater, Kuzman Ganchev, João Graça, Fernando Pereira, Ben Taskar
Abstract: A strong inductive bias is essential in unsupervised grammar induction. In this paper, we explore a particular sparsity bias in dependency grammars that encourages a small number of unique dependency types. We use part-of-speech (POS) tags to group dependencies by parent-child types and investigate sparsity-inducing penalties on the posterior distributions of parent-child POS tag pairs in the posterior regularization (PR) framework of Graça et al. (2007). In experiments with 12 different languages, we achieve significant gains in directed attachment accuracy over the standard expectation maximization (EM) baseline, with an average accuracy improvement of 6.5%, outperforming EM by at least 1% for 9 out of 12 languages. Furthermore, the new method outperforms models based on standard Bayesian sparsity-inducing parameter priors with an average improvement of 5% and positive gains of at least 1% for 9 out of 12 languages. On English text in particular, we show that our approach improves performance over other state-of-the-art techniques.
6 0.21965903 17 jmlr-2011-Computationally Efficient Convolved Multiple Output Gaussian Processes
7 0.21951874 12 jmlr-2011-Bayesian Co-Training
8 0.21805404 74 jmlr-2011-Operator Norm Convergence of Spectral Clustering on Level Sets
9 0.21671766 96 jmlr-2011-Two Distributed-State Models For Generating High-Dimensional Time Series
10 0.21575619 48 jmlr-2011-Kernel Analysis of Deep Networks
11 0.21517178 16 jmlr-2011-Clustering Algorithms for Chains
12 0.21415968 13 jmlr-2011-Bayesian Generalized Kernel Mixed Models
13 0.21353282 91 jmlr-2011-The Sample Complexity of Dictionary Learning
14 0.21317865 67 jmlr-2011-Multitask Sparsity via Maximum Entropy Discrimination
15 0.21235114 53 jmlr-2011-Learning High-Dimensional Markov Forest Distributions: Analysis of Error Rates
16 0.21176788 64 jmlr-2011-Minimum Description Length Penalization for Group and Multi-Task Sparse Learning
17 0.21139725 66 jmlr-2011-Multiple Kernel Learning Algorithms
18 0.21110214 9 jmlr-2011-An Asymptotic Behaviour of the Marginal Likelihood for General Markov Models
19 0.21059407 29 jmlr-2011-Efficient Learning with Partially Observed Attributes
20 0.2086553 84 jmlr-2011-Semi-Supervised Learning with Measure Propagation