jmlr jmlr2008 jmlr2008-80 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Tzu-Kuo Huang, Chih-Jen Lin, Ruby C. Weng
Abstract: This paper proposes new approaches to rank individuals from their group comparison results. Many real-world problems are of this type. For example, ranking players from team comparisons is important in some sports. In machine learning, a closely related application is classification using coding matrices. Group comparison results are usually in two types: binary indicator outcomes (wins/losses) or measured outcomes (scores). For each type of results, we propose new models for estimating individuals’ abilities, and hence a ranking of individuals. The estimation is carried out by solving convex minimization problems, for which we develop easy and efficient solution procedures. Experiments on real bridge records and multi-class classification demonstrate the viability of the proposed models. Keywords: ranking, group comparison, binary/scored outcomes, Bradley-Terry model, multiclass classification
Reference: text
sentIndex sentText sentNum sentScore
1 TW Department of Statistics National Chengchi University Taipei 116, Taiwan Editor: Greg Ridgeway Abstract This paper proposes new approaches to rank individuals from their group comparison results. [sent-10, score-0.263]
2 For example, ranking players from team comparisons is important in some sports. [sent-12, score-0.296]
3 Group comparison results are usually in two types: binary indicator outcomes (wins/losses) or measured outcomes (scores). [sent-14, score-0.222]
4 Introduction We address an interesting problem of estimating individuals’ abilities from their group comparison results. [sent-19, score-0.208]
5 In a bridge match two partnerships form a team to compete with another two. [sent-23, score-0.469]
6 The match record fairly reflects which two partnerships are better, but a partnership’s raw score, depending on different boards, does not indicate its ability. [sent-24, score-0.313]
7 Finding reasonable individual rankings using all group comparison records is thus a challenging task. [sent-25, score-0.226]
8 This line of research stems from the study of paired comparisons (David, 1988), in which one group/team consists of only one individual, and individuals’ abilities are estimated from paired comparison results. [sent-30, score-0.313]
9 H UANG , L IN AND W ENG Bradley-Terry model (Bradley and Terry, 1952): suppose there are k individuals whose abilities are indicated by a non-negative vector p = [p1 p2 . [sent-33, score-0.297]
10 pi + p j (1) If comparisons are independent, then the maximum likelihood estimate of p is obtained by solving min p − ∑ ni j log i= j pi pi + p j (2) k subject to ∑ p j = 1, p j ≥ 0, j = 1, . [sent-38, score-0.207]
11 , k, j=1 where ni j is the number of times individual i beats j. [sent-41, score-0.16]
12 Going from paired to group comparisons, we consider k individuals {1, . [sent-45, score-0.249]
13 Before seeking sophisticated models, an intuitive way to estimate the sth individual’s ability is by the number of its winning comparisons normalized by the total number it involves: ∑i:s∈Ii+ n+ + ∑i:s∈Ii− n− i i . [sent-51, score-0.138]
14 ∑i:s∈Ii ni (3) In the case of paired comparisons, several authors (David, 1988; Hastie and Tibshirani, 1998) have shown that if nsi > 0, nis > 0, and nsi + nis = constant, ∀s, i, (4) then the ranking by (3) is identical to that by the solution of (2). [sent-52, score-0.238]
15 Under the assumption that comparisons are independent, individuals’ abilities can be estimated by minimizing the negative loglikelihood of (5): m min p − ∑ n+ log i i=1 ∑ j: j∈Ii− p j ∑ j: j∈Ii+ p j + n− log i ∑ j: j∈Ii p j ∑ j: j∈Ii p j k subject to ∑ p j = 1, (6) p j ≥ 0, j = 1, . [sent-60, score-0.293]
16 Some work use these “measured” outcomes for paired comparisons; an example is Glickman (1993): instead of modeling the probability that one individual beats another, he considers the difference in two individuals’ abilities as a random variable, whose realization is the difference in two scores. [sent-74, score-0.359]
17 Individuals’ abilities are then estimated via maximizing the likelihood. [sent-75, score-0.149]
18 In this paper we focus on the batch setting, under which individuals’ abilities are not estimated until all comparisons are finished. [sent-76, score-0.229]
19 This setting is suitable for annual sports events, such as the Bermuda Bowl for bridge considered in Section 4, where the goal is to rank participants according to their performances in the event. [sent-77, score-0.139]
20 An example is online gaming, where players make teams to compete against one another anytime they wish and expect a real-time update of their ranking right after a game is over. [sent-79, score-0.203]
21 Individuals’ abilities are estimated by training the ANN with the delta rule, a typical online or incremental learning technique. [sent-84, score-0.149]
22 (2006b) is that one can estimate individuals’ abilities by minimizing unconstrained convex formulations. [sent-88, score-0.168]
23 This section may be the first study on finding individuals’ abilities from scored group comparisons. [sent-92, score-0.249]
24 Section 4 ranks partnerships in real bridge matches with the proposed approaches. [sent-93, score-0.394]
25 Comparisons with Binary Outcomes We denote individuals’ abilities as a vector v ∈ Rk , −∞ < vs < ∞, s = 1, . [sent-100, score-0.359]
26 A team’s ability is then defined as the sum of its members’: for Ii+ and Ii− , their abilities are respectively Ti+ ≡ ∑ vs and ∑ Ti− ≡ s:s∈Ii+ vs . [sent-105, score-0.588]
27 RLS) Recall that n+ and n− are respectively the number of comparisons teams Ii+ and Ii− win. [sent-130, score-0.174]
28 (12) Take bridge in teams of four as an example. [sent-136, score-0.186]
29 An individual stands for a partnership, so G’s jth column records the jth partnership’s team memberships in all m matches. [sent-137, score-0.173]
30 Since a match is played by four partnerships from two teams, each row of G has two 1’s, two −1’s and k−4 0’s. [sent-138, score-0.328]
31 read as “The first match: the 1st, 2nd partnerships versus the 3rd, 4th; the second match: the 1st, 2nd versus the 5th, 6th; . [sent-163, score-0.224]
32 This result shows that teams’ members should change frequently across comparisons (as indicated by rank(G) = k) so that individuals’ abilities are uniquely determined. [sent-171, score-0.256]
33 To see how multiple rankings occur, consider an extreme case where several players always belong to the same team. [sent-172, score-0.182]
34 After solving (14), their respective abilities can take any values but still remain optimal as long as the total ability is equal to the virtual player’s. [sent-174, score-0.168]
35 (16) The rationale of the regularization is that individuals have equal abilities before having comparisons. [sent-177, score-0.313]
36 ML) Under the assumption that comparisons are independent, the negative log-likelihood function is − + m eTi i=1 eTi + eTi l(v) ≡ − ∑ n+ log i − + + n− log i eTi + − eTi + eTi , (17) and we may estimate v by arg min l(v). [sent-182, score-0.144]
37 Here we consider a special one k µ ∑ (evs + e−vs ), (18) s=1 which is strictly convex and has unique minimum at vs = 0, s = 1, . [sent-189, score-0.229]
38 Since µ is small, ∑ ∑ n+ + i i:s∈Ii+ n− i ≈ + ni eTi ∑ − + i:s∈Ii+ i:s∈Ii− eTi + eTi − ni eTi ∑ + , − + i:s∈Ii− eTi + eTi (21) which is a reasonable condition that the total number of observed wins of individual s is nearly the expected number by the assumed model. [sent-200, score-0.229]
39 Meanwhile, the last term in ∂l(v)/∂v s restricts the value of vs from extremity, and thereby brings some robustness against huge n + or n− . [sent-201, score-0.21]
40 Comparisons with Scored Outcomes This section proposes estimating individuals’ abilities based on measured outcomes, such as points in basketball or soccer games. [sent-253, score-0.169]
41 We still use random variables Yi+ and Yi− for team performances, but give n+ and n− different meanings: they now denote scores of Ii+ and Ii− . [sent-254, score-0.142]
42 Individuals’ abilities are estimated by maximizing the likelihood of score differences. [sent-259, score-0.149]
43 ML) As mentioned in Section 2, using normal distributions for comparisons with binary outcomes is computationally more difficult due to a complicated form of P(Ii+ beats Ii− ). [sent-263, score-0.245]
44 , k, and view the score difference of individuals i and j as a realization of Yi −Y j . [sent-268, score-0.148]
45 By assuming Yi and Y j are independent for all individuals, Yi −Y j ∼ N(vi − v j , 2σ2 ), (25) and individuals’ abilities are estimated by maximizing the likelihood. [sent-269, score-0.149]
46 2196 +n+ i + ∑ i:s∈Ii− − eTi + eTi +n+ i +n− i + eTi − +n+ i , R ANKING I NDIVIDUALS BY G ROUP C OMPARISONS A1 B3 N N B2 W E B1 E A3 A4 W S B4 S A2 Figure 1: A typical bridge match setting. [sent-307, score-0.138]
47 Ranking Partnerships from Real Bridge Records This section presents a real application: ranking partnerships from match records of Bermuda Bowl 2005,1 which is the most prestigious bridge event. [sent-322, score-0.47]
48 In a match two partnerships (four players) from a team compete with two from another team. [sent-323, score-0.393]
49 The rules require mutual understanding within a partnership, so partnerships are typically fixed while a team can send different partnerships for different matches. [sent-324, score-0.555]
50 To rank partnerships using our model, an individual stands for a partnership, and every Ti+ (or Ti− ) consists of two individuals. [sent-325, score-0.278]
51 Earlier we refer to each Ti+ as a team and in bridge the two partnerships (or four players) of Ti+ are really called a team. [sent-327, score-0.423]
52 For example, in the second board IN’s NS partnership won at Table I and got 100 points while PT’s NS got 650 at Table II. [sent-348, score-0.15]
53 An important feature is that a board’s four hands are at identical positions of two tables, but a team’s two partnerships sit at complementary positions. [sent-352, score-0.257]
54 For example, Table 1 shows records of the first ten boards of the match between two Indian partnerships and two Portuguese partnerships. [sent-357, score-0.387]
55 2 A quick look at Table 1 may motivate the following straightforward approach: a partnership’s score in a match is the sum of raw scores over all boards, and its ability is the average over the matches it plays. [sent-360, score-0.179]
56 Moreover, since boards are different across rounds and partnerships play in different rounds, the sum of raw scores can be more unfair. [sent-363, score-0.34]
57 We consider qualifying games: 22 teams from all over the world had a round robin tournament, which consisted of 22 = 231 matches and each team played 21. [sent-366, score-0.283]
58 Most teams had six players in three 2 fixed partnerships, and there were 69 partnerships in total. [sent-367, score-0.394]
59 To use our model, the comparison setting matrix G defined in (12) is of size 231 × 69; as shown in (13) each row records a match setting and has exactly two 1’s (two partnerships from one team), two −1’s (two partnerships from another team) and 65 0’s (the remaining partnerships). [sent-377, score-0.557]
60 On the one hand, they summarize the relative performances of players or teams based on outcomes in the event, so that people may easily distinguish outstanding ones from poor ones. [sent-384, score-0.273]
61 On the other hand, rankings in past events may indicate the outcomes of future events, and can therefore become a basis for designing future event schedules. [sent-385, score-0.204]
62 For the first purpose, a good ranking must be consistent with available outcomes of the event, which relates to minimizing errors on training data, while for the second, a good ranking must predict well on the outcomes of future events, which is about minimizing errors on unseen data. [sent-387, score-0.328]
63 We thus adopt these two principles to evaluate the proposed approaches, and in the context of bridge matches, they translate into the following evaluation criteria: • Empirical Performance: How well do the estimated abilities and rankings fit the available match records? [sent-388, score-0.388]
64 • Generalization Performance: How well do the estimated abilities and rankings predict the outcomes of unseen matches? [sent-389, score-0.353]
65 Here we distinguish individuals’ abilities from their ranking: Abilities give a ranking, but not vice versa. [sent-390, score-0.149]
66 When we only have a ranking of individuals, groups’ strengths are not directly available since the relation of individuals’ ranks to those of groups is unclear. [sent-391, score-0.142]
67 In contrast, if individuals’ abilities are available, a group’s ability can be the sum of its members’. [sent-392, score-0.168]
68 We thus propose different + − + − error measures for abilities and rankings. [sent-393, score-0.149]
69 , (Im , Im , n+ , n− )} be the group m m 1 1 k of individuals’ abilities, we define comparisons of interest and their outcomes. [sent-397, score-0.139]
70 For a vector v ∈ R the • Group Comparison Error: m ∑I GCE(v) ≡ (n+ − n− )(Ti+ − Ti− ) ≤ 0 i i i=1 m , where I{·} is the indicator function; Ti+ and Ti− are predicted group abilities of Ii+ and Ii− , as defined in (7). [sent-398, score-0.208]
71 , 2−8 , 2−9 ] by the Leave-One-Out (LOO) validation on the training set, estimated partnerships’ abilities with the best µ, and then computed GCE and GRE on the testing set. [sent-420, score-0.165]
72 These results show that the proposed approaches are effective in fitting the available bridge match records. [sent-428, score-0.159]
73 We then use the same summation assumption to obtain groups’ abilities for computing GCEs. [sent-432, score-0.149]
74 Moreover, we are using records in the qualifying stage, a round-robin tournament in which every two teams (countries) played exactly one match. [sent-511, score-0.187]
75 Consequently, when a match is removed from the training set, the four competing partnerships of that match have no chance to meet directly during the training stage. [sent-512, score-0.364]
76 Indirect comparisons may only be marginally useful in predicting those partnerships’ competition outcomes due to the lack of transitivity. [sent-513, score-0.183]
77 In conclusion, the outcome of a match in this bridge data set may not be well indicated by outcomes of the other matches, and therefore all of the approaches failed to generalize well. [sent-514, score-0.262]
78 Since GRE only looks at the subset of matches in which group members’ ranks clearly decide groups’ relative strengths, the size of this subset, that is, the denominator in GRE, may also be a performance indicator of each approach. [sent-516, score-0.153]
79 It is clear that the ranking by AVG is able to determine the outcomes of more matches, but at the same time causes more errors. [sent-518, score-0.164]
80 Finally, we list the top ten partnerships ranked by Ext-B. [sent-535, score-0.246]
81 Properties of Different Approaches Although we distinguish binary comparisons from scored ones, they are similar in some situations. [sent-539, score-0.137]
82 Table 3 lists partnership rankings obtained by applying the six approaches to the entire set of match records. [sent-543, score-0.341]
83 We computed Kendall’s tau for every pair of the six rankings and present them in Table 4, which indicates roughly three groups: Ext-B. [sent-545, score-0.152]
84 We then measure the distance between two groups of rankings g1 and g2: For each partnership, d(ranks by g1, ranks by g2) min(ranks by g2) − max(ranks by g1) if ranks by g1 are all smaller, ≡ min(ranks by g1) − max(ranks by g2) if ranks by g2 are all smaller, 0 otherwise. [sent-551, score-0.298]
85 (39) In Table 3 we respectively underline and boldface partnerships satisfying (38) and (39). [sent-571, score-0.224]
86 The eleven ranks satisfying (39) shows that AVG’s ranking is closer to the team ranking: 6 Partnerships satisfying (39) have higher ranks than those by the others when the team ranks are high, but have the opposite when the team ranks are low. [sent-572, score-0.614]
87 This observation indicates that AVG may fail to identify weak (strong) individuals from strong (weak) groups. [sent-573, score-0.148]
88 For example, Italy’s second partnership is ranked 18th, 14th, 12th, 18th, 4th and 4th by ExtB. [sent-591, score-0.151]
89 i i • For the two scored-outcome approaches, extreme outcomes have a greater impact on NMS. [sent-657, score-0.136]
90 Interestingly, five of these eight partnerships played matches where weak teams beat strong teams by an extreme amount, such as Netherlands beating U. [sent-686, score-0.507]
91 RLS is vulnerable to even only few extreme outcomes so as to change the overall ranking. [sent-694, score-0.136]
92 RLS is m m i=1 i=1 vs = ∑ Asi log n+ − log n− = ∑ Asi log n+ − log(n − n+ ) , i i i i 2205 H UANG , L IN AND W ENG 25 90 80 70 70 60 60 50 50 40 40 30 30 20 20 10 20 90 80 10 f(x) 15 10 5 0 0 0. [sent-699, score-0.306]
93 To check the sensitivity of vs with respect to the change of n+ , we i 1 nAsi 1 ∂vs . [sent-713, score-0.21]
94 + = Asi + + + = + ∂ni ni n − ni ni (n − n+ ) i Clearly, the estimate vs is more sensitive to extreme values of n+ , that is, n+ ≈ 0 or n+ ≈ n. [sent-714, score-0.528]
95 ML we have m m i=1 i=1 vs = ∑ Asi (n+ − n− ) = ∑ Asi (2n+ − n) i i i and ∂vs = 2Asi . [sent-716, score-0.21]
96 ML has four, among which the partnerships satisfying (38) participate in three. [sent-755, score-0.224]
97 ML) thus consider classes as individuals and two-class problems as group comparisons; the 1’s and −1’s in the ith row of G correspond to Ii+ and Ii− , respectively. [sent-810, score-0.207]
98 Conclusions We propose new and useful methods to rank individuals from group comparisons. [sent-929, score-0.242]
99 Experiments show that the proposed approaches give reasonable partnership rankings from bridge records and perform effectively in multi-class classification. [sent-932, score-0.374]
100 Therefore, s lim vt+1 = lim vts + log s t∈N, t→∞ Bs + t∈N, t→∞ = v∗ + log s t B2 + 4µAts e−vs s 2Ats ∗ B2 + 4µA∗ e−vs s s 2A∗ s Bs + = v∗ , s (49) and lim vt+1 = t∈N,t→∞ lim vt = v∗ . [sent-985, score-0.24]
wordName wordTfidf (topN-words)
[('eti', 0.705), ('ti', 0.24), ('partnerships', 0.224), ('vs', 0.21), ('ext', 0.19), ('abilities', 0.149), ('individuals', 0.148), ('partnership', 0.129), ('ii', 0.118), ('team', 0.107), ('outcomes', 0.103), ('uang', 0.102), ('rankings', 0.101), ('avg', 0.099), ('ndividuals', 0.095), ('ni', 0.095), ('teams', 0.094), ('evs', 0.088), ('vt', 0.088), ('roup', 0.081), ('anking', 0.081), ('comparisons', 0.08), ('bridge', 0.076), ('omparisons', 0.072), ('exploss', 0.068), ('gre', 0.068), ('eng', 0.066), ('match', 0.062), ('ranking', 0.061), ('huang', 0.06), ('gt', 0.059), ('group', 0.059), ('ranks', 0.058), ('boards', 0.054), ('players', 0.048), ('records', 0.047), ('gv', 0.046), ('beats', 0.046), ('bs', 0.042), ('paired', 0.042), ('yi', 0.041), ('scored', 0.041), ('cosh', 0.041), ('gce', 0.041), ('nm', 0.037), ('matches', 0.036), ('scores', 0.035), ('rank', 0.035), ('asi', 0.034), ('extreme', 0.033), ('log', 0.032), ('loo', 0.031), ('performances', 0.028), ('coding', 0.028), ('six', 0.028), ('gres', 0.027), ('imps', 0.027), ('vps', 0.027), ('raw', 0.027), ('members', 0.027), ('played', 0.026), ('ruby', 0.026), ('ns', 0.026), ('opponents', 0.026), ('dense', 0.025), ('tau', 0.023), ('allwein', 0.023), ('groups', 0.023), ('lim', 0.022), ('ranked', 0.022), ('approaches', 0.021), ('board', 0.021), ('della', 0.021), ('pietra', 0.021), ('india', 0.021), ('basketball', 0.02), ('bermuda', 0.02), ('bowl', 0.02), ('nsi', 0.02), ('portugal', 0.02), ('qualifying', 0.02), ('rival', 0.02), ('sth', 0.02), ('wins', 0.02), ('kendall', 0.019), ('taipei', 0.019), ('winning', 0.019), ('convex', 0.019), ('individual', 0.019), ('ability', 0.019), ('pt', 0.018), ('tw', 0.018), ('histograms', 0.018), ('sit', 0.017), ('bakiri', 0.017), ('ui', 0.016), ('four', 0.016), ('binary', 0.016), ('testing', 0.016), ('regularization', 0.016)]
simIndex simValue paperId paperTitle
same-paper 1 0.99999964 80 jmlr-2008-Ranking Individuals by Group Comparisons
Author: Tzu-Kuo Huang, Chih-Jen Lin, Ruby C. Weng
Abstract: This paper proposes new approaches to rank individuals from their group comparison results. Many real-world problems are of this type. For example, ranking players from team comparisons is important in some sports. In machine learning, a closely related application is classification using coding matrices. Group comparison results are usually in two types: binary indicator outcomes (wins/losses) or measured outcomes (scores). For each type of results, we propose new models for estimating individuals’ abilities, and hence a ranking of individuals. The estimation is carried out by solving convex minimization problems, for which we develop easy and efficient solution procedures. Experiments on real bridge records and multi-class classification demonstrate the viability of the proposed models. Keywords: ranking, group comparison, binary/scored outcomes, Bradley-Terry model, multiclass classification
2 0.071102753 69 jmlr-2008-Non-Parametric Modeling of Partially Ranked Data
Author: Guy Lebanon, Yi Mao
Abstract: Statistical models on full and partial rankings of n items are often of limited practical use for large n due to computational consideration. We explore the use of non-parametric models for partially ranked data and derive computationally efficient procedures for their use for large n. The derivations are largely possible through combinatorial and algebraic manipulations based on the lattice of partial rankings. A bias-variance analysis and an experimental study demonstrate the applicability of the proposed method. Keywords: ranked data, partially ordered sets, kernel smoothing
Author: Salvador García, Francisco Herrera
Abstract: In a recently published paper in JMLR, Demˇar (2006) recommends a set of non-parametric stas tistical tests and procedures which can be safely used for comparing the performance of classifiers over multiple data sets. After studying the paper, we realize that the paper correctly introduces the basic procedures and some of the most advanced ones when comparing a control method. However, it does not deal with some advanced topics in depth. Regarding these topics, we focus on more powerful proposals of statistical procedures for comparing n × n classifiers. Moreover, we illustrate an easy way of obtaining adjusted and comparable p-values in multiple comparison procedures. Keywords: statistical methods, non-parametric test, multiple comparisons tests, adjusted p-values, logically related hypotheses
4 0.050205626 95 jmlr-2008-Value Function Based Reinforcement Learning in Changing Markovian Environments
Author: Balázs Csanád Csáji, László Monostori
Abstract: The paper investigates the possibility of applying value function based reinforcement learning (RL) methods in cases when the environment may change over time. First, theorems are presented which show that the optimal value function of a discounted Markov decision process (MDP) Lipschitz continuously depends on the immediate-cost function and the transition-probability function. Dependence on the discount factor is also analyzed and shown to be non-Lipschitz. Afterwards, the concept of (ε, δ)-MDPs is introduced, which is a generalization of MDPs and ε-MDPs. In this model the environment may change over time, more precisely, the transition function and the cost function may vary from time to time, but the changes must be bounded in the limit. Then, learning algorithms in changing environments are analyzed. A general relaxed convergence theorem for stochastic iterative algorithms is presented. We also demonstrate the results through three classical RL methods: asynchronous value iteration, Q-learning and temporal difference learning. Finally, some numerical experiments concerning changing environments are presented. Keywords: Markov decision processes, reinforcement learning, changing environments, (ε, δ)MDPs, value function bounds, stochastic iterative algorithms
5 0.047912236 18 jmlr-2008-Bayesian Inference and Optimal Design for the Sparse Linear Model
Author: Matthias W. Seeger
Abstract: The linear model with sparsity-favouring prior on the coefficients has important applications in many different domains. In machine learning, most methods to date search for maximum a posteriori sparse solutions and neglect to represent posterior uncertainties. In this paper, we address problems of Bayesian optimal design (or experiment planning), for which accurate estimates of uncertainty are essential. To this end, we employ expectation propagation approximate inference for the linear model with Laplace prior, giving new insight into numerical stability properties and proposing a robust algorithm. We also show how to estimate model hyperparameters by empirical Bayesian maximisation of the marginal likelihood, and propose ideas in order to scale up the method to very large underdetermined problems. We demonstrate the versatility of our framework on the application of gene regulatory network identification from micro-array expression data, where both the Laplace prior and the active experimental design approach are shown to result in significant improvements. We also address the problem of sparse coding of natural images, and show how our framework can be used for compressive sensing tasks. Part of this work appeared in Seeger et al. (2007b). The gene network identification application appears in Steinke et al. (2007). Keywords: sparse linear model, Laplace prior, expectation propagation, approximate inference, optimal design, Bayesian statistics, gene network recovery, image coding, compressive sensing
6 0.045433797 52 jmlr-2008-Learning from Multiple Sources
7 0.033818584 6 jmlr-2008-A Recursive Method for Structural Learning of Directed Acyclic Graphs
8 0.029578194 91 jmlr-2008-Trust Region Newton Method for Logistic Regression
9 0.029140729 65 jmlr-2008-Multi-Agent Reinforcement Learning in Common Interest and Fixed Sum Stochastic Games: An Experimental Study
10 0.027895709 35 jmlr-2008-Finding Optimal Bayesian Network Given a Super-Structure
11 0.027753057 58 jmlr-2008-Max-margin Classification of Data with Absent Features
12 0.026090087 56 jmlr-2008-Magic Moments for Structured Output Prediction
13 0.025721829 29 jmlr-2008-Cross-Validation Optimization for Large Scale Structured Classification Kernel Methods
14 0.025536401 11 jmlr-2008-Aggregation of SVM Classifiers Using Sobolev Spaces
15 0.025355175 23 jmlr-2008-Comments on the Complete Characterization of a Family of Solutions to a GeneralizedFisherCriterion
16 0.023244597 31 jmlr-2008-Dynamic Hierarchical Markov Random Fields for Integrated Web Data Extraction
17 0.022379123 32 jmlr-2008-Estimating the Confidence Interval for Prediction Errors of Support Vector Machine Classifiers
18 0.022140736 62 jmlr-2008-Model Selection Through Sparse Maximum Likelihood Estimation for Multivariate Gaussian or Binary Data
19 0.022021864 51 jmlr-2008-Learning Similarity with Operator-valued Large-margin Classifiers
20 0.021872766 5 jmlr-2008-A New Algorithm for Estimating the Effective Dimension-Reduction Subspace
topicId topicWeight
[(0, 0.122), (1, -0.021), (2, -0.032), (3, 0.022), (4, -0.049), (5, -0.047), (6, -0.021), (7, -0.03), (8, 0.002), (9, 0.027), (10, 0.023), (11, -0.013), (12, -0.081), (13, 0.038), (14, 0.024), (15, -0.062), (16, 0.02), (17, 0.028), (18, 0.044), (19, 0.11), (20, -0.011), (21, -0.25), (22, -0.339), (23, -0.179), (24, -0.125), (25, -0.032), (26, -0.104), (27, 0.139), (28, -0.122), (29, 0.392), (30, -0.124), (31, -0.043), (32, -0.038), (33, -0.126), (34, -0.09), (35, -0.025), (36, 0.094), (37, 0.04), (38, -0.151), (39, 0.138), (40, -0.005), (41, 0.079), (42, 0.012), (43, -0.209), (44, -0.013), (45, 0.001), (46, -0.141), (47, -0.335), (48, -0.132), (49, 0.046)]
simIndex simValue paperId paperTitle
same-paper 1 0.9683817 80 jmlr-2008-Ranking Individuals by Group Comparisons
Author: Tzu-Kuo Huang, Chih-Jen Lin, Ruby C. Weng
Abstract: This paper proposes new approaches to rank individuals from their group comparison results. Many real-world problems are of this type. For example, ranking players from team comparisons is important in some sports. In machine learning, a closely related application is classification using coding matrices. Group comparison results are usually in two types: binary indicator outcomes (wins/losses) or measured outcomes (scores). For each type of results, we propose new models for estimating individuals’ abilities, and hence a ranking of individuals. The estimation is carried out by solving convex minimization problems, for which we develop easy and efficient solution procedures. Experiments on real bridge records and multi-class classification demonstrate the viability of the proposed models. Keywords: ranking, group comparison, binary/scored outcomes, Bradley-Terry model, multiclass classification
2 0.35175404 69 jmlr-2008-Non-Parametric Modeling of Partially Ranked Data
Author: Guy Lebanon, Yi Mao
Abstract: Statistical models on full and partial rankings of n items are often of limited practical use for large n due to computational consideration. We explore the use of non-parametric models for partially ranked data and derive computationally efficient procedures for their use for large n. The derivations are largely possible through combinatorial and algebraic manipulations based on the lattice of partial rankings. A bias-variance analysis and an experimental study demonstrate the applicability of the proposed method. Keywords: ranked data, partially ordered sets, kernel smoothing
Author: Salvador García, Francisco Herrera
Abstract: In a recently published paper in JMLR, Demˇar (2006) recommends a set of non-parametric stas tistical tests and procedures which can be safely used for comparing the performance of classifiers over multiple data sets. After studying the paper, we realize that the paper correctly introduces the basic procedures and some of the most advanced ones when comparing a control method. However, it does not deal with some advanced topics in depth. Regarding these topics, we focus on more powerful proposals of statistical procedures for comparing n × n classifiers. Moreover, we illustrate an easy way of obtaining adjusted and comparable p-values in multiple comparison procedures. Keywords: statistical methods, non-parametric test, multiple comparisons tests, adjusted p-values, logically related hypotheses
4 0.1666925 35 jmlr-2008-Finding Optimal Bayesian Network Given a Super-Structure
Author: Eric Perrier, Seiya Imoto, Satoru Miyano
Abstract: Classical approaches used to learn Bayesian network structure from data have disadvantages in terms of complexity and lower accuracy of their results. However, a recent empirical study has shown that a hybrid algorithm improves sensitively accuracy and speed: it learns a skeleton with an independency test (IT) approach and constrains on the directed acyclic graphs (DAG) considered during the search-and-score phase. Subsequently, we theorize the structural constraint by introducing the concept of super-structure S, which is an undirected graph that restricts the search to networks whose skeleton is a subgraph of S. We develop a super-structure constrained optimal search (COS): its time complexity is upper bounded by O(γm n ), where γm < 2 depends on the maximal degree m of S. Empirically, complexity depends on the average degree m and sparse structures ˜ allow larger graphs to be calculated. Our algorithm is faster than an optimal search by several orders and even finds more accurate results when given a sound super-structure. Practically, S can be approximated by IT approaches; significance level of the tests controls its sparseness, enabling to control the trade-off between speed and accuracy. For incomplete super-structures, a greedily post-processed version (COS+) still enables to significantly outperform other heuristic searches. Keywords: subset Bayesian networks, structure learning, optimal search, super-structure, connected
5 0.16138451 18 jmlr-2008-Bayesian Inference and Optimal Design for the Sparse Linear Model
Author: Matthias W. Seeger
Abstract: The linear model with sparsity-favouring prior on the coefficients has important applications in many different domains. In machine learning, most methods to date search for maximum a posteriori sparse solutions and neglect to represent posterior uncertainties. In this paper, we address problems of Bayesian optimal design (or experiment planning), for which accurate estimates of uncertainty are essential. To this end, we employ expectation propagation approximate inference for the linear model with Laplace prior, giving new insight into numerical stability properties and proposing a robust algorithm. We also show how to estimate model hyperparameters by empirical Bayesian maximisation of the marginal likelihood, and propose ideas in order to scale up the method to very large underdetermined problems. We demonstrate the versatility of our framework on the application of gene regulatory network identification from micro-array expression data, where both the Laplace prior and the active experimental design approach are shown to result in significant improvements. We also address the problem of sparse coding of natural images, and show how our framework can be used for compressive sensing tasks. Part of this work appeared in Seeger et al. (2007b). The gene network identification application appears in Steinke et al. (2007). Keywords: sparse linear model, Laplace prior, expectation propagation, approximate inference, optimal design, Bayesian statistics, gene network recovery, image coding, compressive sensing
6 0.1575304 95 jmlr-2008-Value Function Based Reinforcement Learning in Changing Markovian Environments
7 0.15309022 76 jmlr-2008-Optimization Techniques for Semi-Supervised Support Vector Machines
8 0.15070742 52 jmlr-2008-Learning from Multiple Sources
10 0.14632043 5 jmlr-2008-A New Algorithm for Estimating the Effective Dimension-Reduction Subspace
11 0.11441068 58 jmlr-2008-Max-margin Classification of Data with Absent Features
13 0.10376754 61 jmlr-2008-Mixed Membership Stochastic Blockmodels
14 0.092219442 93 jmlr-2008-Using Markov Blankets for Causal Structure Learning (Special Topic on Causality)
15 0.091694862 27 jmlr-2008-Consistency of the Group Lasso and Multiple Kernel Learning
16 0.089856662 11 jmlr-2008-Aggregation of SVM Classifiers Using Sobolev Spaces
17 0.088954635 56 jmlr-2008-Magic Moments for Structured Output Prediction
18 0.087592006 74 jmlr-2008-Online Learning of Complex Prediction Problems Using Simultaneous Projections
19 0.086859815 6 jmlr-2008-A Recursive Method for Structural Learning of Directed Acyclic Graphs
20 0.085738912 29 jmlr-2008-Cross-Validation Optimization for Large Scale Structured Classification Kernel Methods
topicId topicWeight
[(0, 0.044), (5, 0.018), (31, 0.021), (40, 0.03), (51, 0.013), (54, 0.042), (58, 0.039), (61, 0.011), (64, 0.017), (66, 0.042), (73, 0.418), (76, 0.033), (88, 0.045), (92, 0.03), (94, 0.066), (99, 0.022)]
simIndex simValue paperId paperTitle
same-paper 1 0.77469492 80 jmlr-2008-Ranking Individuals by Group Comparisons
Author: Tzu-Kuo Huang, Chih-Jen Lin, Ruby C. Weng
Abstract: This paper proposes new approaches to rank individuals from their group comparison results. Many real-world problems are of this type. For example, ranking players from team comparisons is important in some sports. In machine learning, a closely related application is classification using coding matrices. Group comparison results are usually in two types: binary indicator outcomes (wins/losses) or measured outcomes (scores). For each type of results, we propose new models for estimating individuals’ abilities, and hence a ranking of individuals. The estimation is carried out by solving convex minimization problems, for which we develop easy and efficient solution procedures. Experiments on real bridge records and multi-class classification demonstrate the viability of the proposed models. Keywords: ranking, group comparison, binary/scored outcomes, Bradley-Terry model, multiclass classification
2 0.72779685 81 jmlr-2008-Regularization on Graphs with Function-adapted Diffusion Processes
Author: Arthur D. Szlam, Mauro Maggioni, Ronald R. Coifman
Abstract: Harmonic analysis and diffusion on discrete data has been shown to lead to state-of-the-art algorithms for machine learning tasks, especially in the context of semi-supervised and transductive learning. The success of these algorithms rests on the assumption that the function(s) to be studied (learned, interpolated, etc.) are smooth with respect to the geometry of the data. In this paper we present a method for modifying the given geometry so the function(s) to be studied are smoother with respect to the modified geometry, and thus more amenable to treatment using harmonic analysis methods. Among the many possible applications, we consider the problems of image denoising and transductive classification. In both settings, our approach improves on standard diffusion based methods. Keywords: diffusion processes, diffusion geometry, spectral graph theory, image denoising, transductive learning, semi-supervised learning
3 0.26826093 34 jmlr-2008-Exponentiated Gradient Algorithms for Conditional Random Fields and Max-Margin Markov Networks
Author: Michael Collins, Amir Globerson, Terry Koo, Xavier Carreras, Peter L. Bartlett
Abstract: Log-linear and maximum-margin models are two commonly-used methods in supervised machine learning, and are frequently used in structured prediction problems. Efficient learning of parameters in these models is therefore an important problem, and becomes a key factor when learning from very large data sets. This paper describes exponentiated gradient (EG) algorithms for training such models, where EG updates are applied to the convex dual of either the log-linear or maxmargin objective function; the dual in both the log-linear and max-margin cases corresponds to minimizing a convex function with simplex constraints. We study both batch and online variants of the algorithm, and provide rates of convergence for both cases. In the max-margin case, O( 1 ) EG ε updates are required to reach a given accuracy ε in the dual; in contrast, for log-linear models only O(log( 1 )) updates are required. For both the max-margin and log-linear cases, our bounds suggest ε that the online EG algorithm requires a factor of n less computation to reach a desired accuracy than the batch EG algorithm, where n is the number of training examples. Our experiments confirm that the online algorithms are much faster than the batch algorithms in practice. We describe how the EG updates factor in a convenient way for structured prediction problems, allowing the algorithms to be efficiently applied to problems such as sequence learning or natural language parsing. We perform extensive evaluation of the algorithms, comparing them to L-BFGS and stochastic gradient descent for log-linear models, and to SVM-Struct for max-margin models. The algorithms are applied to a multi-class problem as well as to a more complex large-scale parsing task. In all these settings, the EG algorithms presented here outperform the other methods. Keywords: exponentiated gradient, log-linear models, maximum-margin models, structured prediction, conditional random fields ∗. These authors contributed equally. c 2008 Michael Col
4 0.26737756 18 jmlr-2008-Bayesian Inference and Optimal Design for the Sparse Linear Model
Author: Matthias W. Seeger
Abstract: The linear model with sparsity-favouring prior on the coefficients has important applications in many different domains. In machine learning, most methods to date search for maximum a posteriori sparse solutions and neglect to represent posterior uncertainties. In this paper, we address problems of Bayesian optimal design (or experiment planning), for which accurate estimates of uncertainty are essential. To this end, we employ expectation propagation approximate inference for the linear model with Laplace prior, giving new insight into numerical stability properties and proposing a robust algorithm. We also show how to estimate model hyperparameters by empirical Bayesian maximisation of the marginal likelihood, and propose ideas in order to scale up the method to very large underdetermined problems. We demonstrate the versatility of our framework on the application of gene regulatory network identification from micro-array expression data, where both the Laplace prior and the active experimental design approach are shown to result in significant improvements. We also address the problem of sparse coding of natural images, and show how our framework can be used for compressive sensing tasks. Part of this work appeared in Seeger et al. (2007b). The gene network identification application appears in Steinke et al. (2007). Keywords: sparse linear model, Laplace prior, expectation propagation, approximate inference, optimal design, Bayesian statistics, gene network recovery, image coding, compressive sensing
5 0.26433769 28 jmlr-2008-Coordinate Descent Method for Large-scale L2-loss Linear Support Vector Machines
Author: Kai-Wei Chang, Cho-Jui Hsieh, Chih-Jen Lin
Abstract: Linear support vector machines (SVM) are useful for classifying large-scale sparse data. Problems with sparse features are common in applications such as document classification and natural language processing. In this paper, we propose a novel coordinate descent algorithm for training linear SVM with the L2-loss function. At each step, the proposed method minimizes a one-variable sub-problem while fixing other variables. The sub-problem is solved by Newton steps with the line search technique. The procedure globally converges at the linear rate. As each sub-problem involves only values of a corresponding feature, the proposed approach is suitable when accessing a feature is more convenient than accessing an instance. Experiments show that our method is more efficient and stable than state of the art methods such as Pegasos and TRON. Keywords: linear support vector machines, document classification, coordinate descent
6 0.26284006 86 jmlr-2008-SimpleMKL
7 0.25672427 74 jmlr-2008-Online Learning of Complex Prediction Problems Using Simultaneous Projections
8 0.25622433 83 jmlr-2008-Robust Submodular Observation Selection
9 0.25586033 39 jmlr-2008-Gradient Tree Boosting for Training Conditional Random Fields
10 0.25570017 88 jmlr-2008-Structural Learning of Chain Graphs via Decomposition
11 0.25503337 56 jmlr-2008-Magic Moments for Structured Output Prediction
12 0.25444692 16 jmlr-2008-Approximations for Binary Gaussian Process Classification
13 0.25395143 66 jmlr-2008-Multi-class Discriminant Kernel Learning via Convex Programming (Special Topic on Model Selection)
14 0.25308701 58 jmlr-2008-Max-margin Classification of Data with Absent Features
15 0.25224927 89 jmlr-2008-Support Vector Machinery for Infinite Ensemble Learning
16 0.2515353 36 jmlr-2008-Finite-Time Bounds for Fitted Value Iteration
17 0.25144091 57 jmlr-2008-Manifold Learning: The Price of Normalization
18 0.2512747 48 jmlr-2008-Learning Bounded Treewidth Bayesian Networks
19 0.24675429 9 jmlr-2008-Active Learning by Spherical Subdivision
20 0.24657291 43 jmlr-2008-Hit Miss Networks with Applications to Instance Selection