nips nips2002 nips2002-58 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Guy Lebanon, John D. Lafferty
Abstract: A distance-based conditional model on the ranking poset is presented for use in classification and ranking. The model is an extension of the Mallows model, and generalizes the classifier combination methods used by several ensemble learning algorithms, including error correcting output codes, discrete AdaBoost, logistic regression and cranking. The algebraic structure of the ranking poset leads to a simple Bayesian interpretation of the conditional model and its special cases. In addition to a unifying view, the framework suggests a probabilistic interpretation for error correcting output codes and an extension beyond the binary coding scheme.
Reference: text
sentIndex sentText sentNum sentScore
1 edu Abstract A distance-based conditional model on the ranking poset is presented for use in classification and ranking. [sent-5, score-0.977]
2 The model is an extension of the Mallows model, and generalizes the classifier combination methods used by several ensemble learning algorithms, including error correcting output codes, discrete AdaBoost, logistic regression and cranking. [sent-6, score-0.318]
3 The algebraic structure of the ranking poset leads to a simple Bayesian interpretation of the conditional model and its special cases. [sent-7, score-1.157]
4 In addition to a unifying view, the framework suggests a probabilistic interpretation for error correcting output codes and an extension beyond the binary coding scheme. [sent-8, score-0.364]
5 A generalization of this problem is conditional ranking, the task of assigning to a full or partial ranking of the items in . [sent-10, score-0.853]
6 This paper studies the algebraic structure of this problem, and proposes a combinatorial structure called the ranking poset for building probability models for conditional ranking. [sent-11, score-1.083]
7 ¦ ¤ ¢ ¥£¡ ¦ ¤ In ensemble approaches to classification and ranking, several base models are combined to produce a single ranker or classifier. [sent-12, score-0.176]
8 An important distinction between different ensemble methods is whether they use discrete inputs, ranked inputs, or confidence-rated predictions. [sent-13, score-0.223]
9 In the case of discrete inputs, the base models provide a single item in , and no preference for a second or third choice is given. [sent-14, score-0.161]
10 In the case of ranked input, the base classifiers output a full or partial ranking over . [sent-15, score-0.817]
11 Of course, discrete input is a special case of ranked input, where the partial ranking consists of the single topmost item. [sent-16, score-0.856]
12 In the case of confidence-rated predictions, the base models again output full or partial rankings, but in addition provide a confidence score, indicating how much one class should be preferred to another. [sent-17, score-0.313]
13 While confidence-rated predictions are sometimes preferable as input to an ensemble method, such confidence scores are often not available (as is typically the case in metasearch), and even when they are available, the scores may not be well calibrated. [sent-18, score-0.111]
14 ¤ ¤ This paper investigates a unifying algebraic framework for ensemble methods for classification and conditional ranking, focusing on the cases of discrete and ranked inputs. [sent-19, score-0.383]
15 , which consists of Our approach is based on the ranking poset on items, denoted the collection of all full and partial rankings equipped with the partial order given by re- © ¨ § finement of rankings. [sent-20, score-1.51]
16 The structure of the poset of partial ranking over gives rise to natural invariant distance functions that generalize Kendall’s Tau and the Hamming distance. [sent-21, score-1.189]
17 Using these distance functions we define a conditional model where . [sent-22, score-0.115]
18 This conditional model generalizes several existing models for classification and ranking, and includes as a special case the Mallows model [11]. [sent-23, score-0.166]
19 In addition, the model represents algebraically the way in which input classifiers are combined in certain ensemble methods, including error correcting output codes [4], several versions of AdaBoost [7, 1], and cranking [10]. [sent-24, score-0.427]
20 &%$¥ © © In Section 2 we review some basic algebraic concepts and in Section 3 we define the ranking poset. [sent-27, score-0.502]
21 The new model and its Bayesian interpretation are described in Section 4. [sent-28, score-0.055]
22 Identifying the items to be ranked with the numbers , if denotes a , then denotes the rank given to item and denotes permutation of the item assigned to rank . [sent-31, score-0.489]
23 The collection of all permutations of -items forms the nonabelian symmetric group of order , denoted . [sent-32, score-0.176]
24 The set of all such partial rankings forms the quotient space . [sent-35, score-0.441]
25 C 5 © #X© 7 W 7 of positive integers that sum An ordered partition of is a sequence to . [sent-36, score-0.103]
26 Such an ordered partition corresponds to a partial ranking of type with items in the first position, items in the second position and so on. [sent-37, score-1.032]
27 Then the subgroup contains all permutations for which the set equality holds for each ; that is, all permutations that only permute within each . [sent-41, score-0.22]
28 A partial ranking of type is equivalent to a coset and the set of such partial rankings forms the quotient space . [sent-42, score-1.092]
29 In the following, we list items separated by vertical lines, indicating that the items on the left side of the line are preferred to (ranked higher than) the items on the right side of the line. [sent-44, score-0.514]
30 A partial ranking where the top 3 items are is denoted by . [sent-46, score-0.827]
31 A partial ranking where with items ranked in the first position is denoted by . [sent-48, score-0.935]
32 In addition, since the indexing of the items is arbitrary, it is appropriate to require invariance to relabeling of . [sent-56, score-0.25]
33 , for all Formally, this amounts to right invariance 7 © 9¢ © %$¥ © %$¥ " © £ f " ¥£ ¢%¤Po 'X¦$¦Po ¦ "¢©%$¦" ¥£ " © ¥£ o " © ¥£ X! [sent-57, score-0.12]
34 o © ¡ 'V¡ is Kendall’s Tau ¤ © F¢ 7 ' ( #& © 5 A popular right invariant distance on (3) 7 # $ % © @ " © ¥£ ¢$" where for and otherwise [8]. [sent-59, score-0.148]
35 Kendall’s Tau can be interpreted as the number of discordant pairs of items between and , or the minimum number of adjacent transpositions needed to bring to . [sent-60, score-0.279]
36 An adjacent transposition flips a pair of items that have adjacent ranks. [sent-61, score-0.233]
37 Critchlow [2] derives extensions of Kendall’s Tau and other distances on to distances on partial rankings. [sent-62, score-0.25]
38 " © ¥£ ¢%$¦" © 5 h© ¥ v @ " RiP¦ £ 5 6¥ v x eD¦ ) © ( @ " iP¦ £ ) 7 3 The Ranking Poset We first define partially ordered sets and then proceed to define the ranking poset. [sent-63, score-0.508]
39 , where is a set and is a binary relation A partially ordered set or poset is a pair that satisfies (1) , (2) if and then , and (3) if and then for all . [sent-65, score-0.578]
40 A finite poset is completely described by the covering relation. [sent-68, score-0.507]
41 The planar Hasse diagram of is the graph for which the elements of are the nodes and the edges are given by the covering relation. [sent-69, score-0.171]
42 The partial order of is defined by refinement; that is, if we can get from to by adding vertical lines. [sent-72, score-0.18]
43 Note that is different from the poset of all set partitions of ordered by partition refinement since in the order of the partition elements matters. [sent-73, score-0.645]
44 Figure 1 shows the Hasse diagram of and a portion of the Hasse diagram of . [sent-74, score-0.196]
45 of ¡ U © E l8¥ h ¨ A subposet of is defined by and if and only if A chain is a poset in which every two elements are comparable. [sent-76, score-0.585]
46 A saturated chain ¦ e Y 5 Xd ¡ ¦ W 5 X8 3 a cbV " Y 5 3 6`Qd4£ " W 5 V 3X6i2£ length is a sequence of elements that satisfy . [sent-77, score-0.114]
47 A chain of is a maximal chain if there is no other saturated chain of that contains it. [sent-78, score-0.173]
48 A graded poset of rank is a poset in which every maximal chain has length . [sent-79, score-1.081]
49 In a graded poset, there is a rank or grade function such that if is a minimal element and if . [sent-80, score-0.164]
50 In particular, the elements in the th grade, all of which are incomparable, are denoted by . [sent-83, score-0.09]
51 That is, for begin, suppose that is a right invariant function on all and . [sent-89, score-0.118]
52 Here right invariance is defined with respect to the natural action of on , given by © ¨ " © ¥£ X! [sent-90, score-0.157]
53 o We will examine several distances that are based on the covering relation of . [sent-94, score-0.077]
54 Down and up moves on the Hasse diagram will be denoted by and respectively. [sent-95, score-0.236]
55 ¡ o We are now ready to give the general form of a conditional model on . [sent-97, score-0.085]
56 The model takes as input rankings in some subset of the ranking poset. [sent-99, score-0.652]
57 © ¨ Thus, conditional on will typically be selected by maxi, a marginal likelihood will be convex and have a unique global I " # £ " ¢ 6§ ¢ ¦¤P § I @ 8 # 5 ¥£ ¡ " £ ¤ ¥ ¤ ¥ S ¢ 4. [sent-108, score-0.085]
58 1 A Bayesian interpretation We now derive a Bayesian interpretation for the model given by (6). [sent-109, score-0.11]
59 Our result parallels the interpretation of multistage ranking models given by Fligner and Verducci [6]. [sent-110, score-0.513]
60 The key fact is that, under appropriate assumptions, the normalizing term does not depend on the partial ordering in the one-dimensional case. [sent-111, score-0.213]
61 # t ¢ and is invariant under the action ¢# % ( $¡ ¤ 0)` o W W I ¢ © 7 1 ¥ 2U$¥ 7 for all 2 Proposition 4. [sent-115, score-0.124]
62 First, note that since is invariant under the action of , it follows that for each . [sent-119, score-0.124]
63 Indeed, by the invariance assumption, and since for we have such that . [sent-120, score-0.089]
64 (9) (10) (by right invariance of ) 8 (11) (12) (by invariance of ) @ ¢ ¥ © ¨ 2 2 a ¢ 2 1 3 ¥ © 7 © @ 1 G¨ 4© ¡ ) Q8 V ¡ `l© §88 § 1 ) @ © 7 ¢ 2 @ 2 % W ¢# % $¡ ¤ &` # ! [sent-122, score-0.209]
65 there is acts transitively on , for all Now, since We thus have that does not in ¥ The underlying generative model is given as follows. [sent-127, score-0.103]
66 If is right invariant, is invariant under the action of , and acts transitively on , then the model defined in equation (6) is the posterior under independent sampling of generalized Mallows models, , with prior . [sent-136, score-0.258]
67 © © 7 4¤ ¥ and @ 2 S 7 o " 5§8 w6£ ¡ 2 7qWX© 7B@IW " §8 A ¥ £ B¡ P© 3 W The conditions of this proposition are satisfied, for example, when as is assumed in the special cases of the next section. [sent-137, score-0.095]
68 ¡ ¢S 7 W qX© 7 5 Special Cases This section derives several special cases of model (6), corresponding to existing ensemble methods. [sent-138, score-0.135]
69 Following [9], the unnormalized versions of all the models may be easily derived, corresponding to the exponential loss used in boosting. [sent-141, score-0.088]
70 1 Cranking and Mallows model Let , and let be the minimum number of downup ( ) moves on the Hasse diagram of needed to bring to . [sent-143, score-0.215]
71 Since adjacent transpositions of permutations may be identified with a down move followed by an up move over the Hasse diagram, is equal to Kendall’s Tau . [sent-144, score-0.168]
72 P¤ X 0U" " (§ § § § ( £ In this case model (6) becomes the cranking model [10] (16) ¢ t 8 £ A % ¢ ¦A ¡ ¤A E ¥ ¤ &` ! [sent-154, score-0.111]
73 Other special cases that fall into this category are the models of Feigin [5] and Critchlow and Verducci [3]. [sent-157, score-0.081]
74 2 Logistic models , , and let ) moves in the Hasse diagram. [sent-159, score-0.11]
75 o © 5 © #W © 7 7 HX 5 © 7 @ 2 BT@ W t S @ Let ( if otherwise (17) " # 7 PG@ ¢%$¦Po £ o " © ¥£ In this case model (6) becomes equivalent to the multiclass generalization of logistic regression. [sent-161, score-0.124]
76 M2; that is, becomes the (discrete) multiclass weak learner in the usual boosting notation. [sent-163, score-0.12]
77 See [9] for details on the correspondence between exponential models and the unnormalized models that correspond to AdaBoost. [sent-164, score-0.119]
78 3 Error correcting output codes A more interesting special case of the algebraic structure described in Sections 3 and 4 is where the ensemble method is error correcting output coding (ECOC) [4]. [sent-166, score-0.644]
79 Here we set , and take the parameter space to be I " # I @ 888 @ qBqb I @ § I (18) ) moves in the Hasse diagram 7 D@ W 5 © 7 @ ¥ ¥ " © ¥£ ¢%$¦Po © ¢ ¤ ¢ ¤ ¥£¡ © qX© 7 W ty¢ 8)E@SS @ l2 5 is the minimal number of up-down ( to . [sent-167, score-0.177]
80 On input , the base rankers output , which corresponds to one of the binary classifiers in ECOC for the appropriate column of the binary coding matrix. [sent-169, score-0.308]
81 For example, consider a binary classifier trained on the coding column . [sent-170, score-0.104]
82 On an input , the classifier outputs 0 or 1, corresponding to the partial rankings and , respectively. [sent-171, score-0.405]
83 Un @ © " v v v ( v ( qXX#¤%#£ ¦ For example, if from the sequence of moves ¦ © ¨ ¢ P¦ q¨© " £ 3 W § c ¨ and ¦ Since (22) Since , the exponent of the model becomes . [sent-186, score-0.079]
84 At test time, the model thus selects the label corresponding to the partial ranking arg . [sent-187, score-0.607]
85 since is strictly negative, Equivalence with the ECOC decision rule thus follows from the fact that is the Hamming distance between the appropriate row of the coding matrix and the concatenation of the bits returned from the binary classifiers. [sent-189, score-0.134]
86 Thus, with the appropriate definitions of and , the conditional model on the ranking poset is a probabilistic formulation of ECOC that yields the same classification decisions. [sent-190, score-0.977]
87 First, relaxing the constraint results in a more general model that corresponds to ECOC with a weighted Hamming distance, or index sensitive “channel,” where the learned weights may adapt to the precision of the various base classifiers. [sent-192, score-0.06]
88 o 2 W @ 888 VB@ b I I @ " ¥ ¦£ ¤ I A further generalization is achieved by considering that for a given coding matrix, the trained classifier for a given column outputs either or depending on the input . [sent-194, score-0.098]
89 Allowing the output of the classifier instead to belong to other grades of results in a model that corresponds to error correcting output codes with nonbinary codes. [sent-195, score-0.247]
90 While this is somewhat antithetic to the original spirit of ECOC—reducing multiclass to binary—the base classifiers in ECOC are often multiclass classifiers such as decision trees in [4]. [sent-196, score-0.216]
91 For such classifiers, the task instead can be viewed as reducing multiclass to partial ranking. [sent-197, score-0.258]
92 Moreover, there need not be an explicit coding matrix. [sent-198, score-0.072]
93 Instead, the input rankers may output different partial rankings for different inputs, which are then combined according to model (6). [sent-199, score-0.491]
94 In this way, a different coding matrix is built for each example in a dynamic manner. [sent-200, score-0.072]
95 Such a scheme may be attractive in bypassing the problem of designing the coding matrix. [sent-201, score-0.072]
96 a 1 ¡ §) a 1 © ¡) © a 1 ¡ i)§ a 1 ¡ ) ¦ © ¨ 6 Summary An algebraic framework has been presented for classification and ranking, leading to conditional models on the ranking poset that are defined in terms of an invariant distance or dissimilarity function. [sent-202, score-1.227]
97 Using the invariance properties of the distances, we derived a generative interpretation of the probabilistic model, which may prove to be useful in model selection and validation. [sent-203, score-0.144]
98 Through different choices of the components and , the family of models was shown to include as special cases the Mallows model, and the classifier combination methods used by logistic models, boosting, cranking, and error correcting output codes. [sent-204, score-0.284]
99 In the case of ECOC, the poset framework shows how probabilities may be assigned to partial rankings in a way that is consistent with the usual definitions of ECOC , and suggests several natural extensions. [sent-205, score-0.844]
100 Cranking: Combining rankings using conditional probability models on permutations. [sent-272, score-0.315]
wordName wordTfidf (topN-words)
[('poset', 0.465), ('ranking', 0.427), ('po', 0.235), ('hasse', 0.221), ('rankings', 0.199), ('partial', 0.18), ('ecoc', 0.176), ('items', 0.161), ('mallows', 0.155), ('correcting', 0.115), ('cranking', 0.111), ('critchlow', 0.111), ('kendall', 0.111), ('classi', 0.109), ('ranked', 0.108), ('diagram', 0.098), ('tau', 0.096), ('invariance', 0.089), ('verducci', 0.089), ('permutations', 0.088), ('invariant', 0.087), ('conditional', 0.085), ('ensemble', 0.085), ('moves', 0.079), ('multiclass', 0.078), ('algebraic', 0.075), ('coding', 0.072), ('lebanon', 0.07), ('fligner', 0.066), ('qip', 0.066), ('transitively', 0.066), ('base', 0.06), ('denoted', 0.059), ('grade', 0.058), ('rank', 0.057), ('ordered', 0.057), ('va', 0.056), ('interpretation', 0.055), ('special', 0.05), ('graded', 0.049), ('codes', 0.048), ('logistic', 0.046), ('partition', 0.046), ('chain', 0.045), ('proposition', 0.045), ('coset', 0.044), ('cosets', 0.044), ('rankers', 0.044), ('subgroup', 0.044), ('subposet', 0.044), ('transpositions', 0.044), ('xq', 0.044), ('nitions', 0.044), ('covering', 0.042), ('boosting', 0.042), ('un', 0.042), ('output', 0.042), ('nement', 0.04), ('item', 0.04), ('ers', 0.04), ('hamming', 0.039), ('orderings', 0.038), ('saturated', 0.038), ('bring', 0.038), ('action', 0.037), ('acts', 0.037), ('adjacent', 0.036), ('adaboost', 0.035), ('psfrag', 0.035), ('biometrika', 0.035), ('topmost', 0.035), ('pg', 0.035), ('qx', 0.035), ('distances', 0.035), ('er', 0.034), ('carrier', 0.033), ('quotient', 0.033), ('unnormalized', 0.033), ('normalizing', 0.033), ('binary', 0.032), ('elements', 0.031), ('right', 0.031), ('replacements', 0.031), ('cc', 0.031), ('lafferty', 0.031), ('models', 0.031), ('distance', 0.03), ('discrete', 0.03), ('yf', 0.029), ('forms', 0.029), ('cations', 0.027), ('qr', 0.027), ('gh', 0.027), ('dissimilarity', 0.027), ('permutation', 0.026), ('input', 0.026), ('metric', 0.025), ('partially', 0.024), ('paired', 0.024), ('exponential', 0.024)]
simIndex simValue paperId paperTitle
same-paper 1 0.9999994 58 nips-2002-Conditional Models on the Ranking Poset
Author: Guy Lebanon, John D. Lafferty
Abstract: A distance-based conditional model on the ranking poset is presented for use in classification and ranking. The model is an extension of the Mallows model, and generalizes the classifier combination methods used by several ensemble learning algorithms, including error correcting output codes, discrete AdaBoost, logistic regression and cranking. The algebraic structure of the ranking poset leads to a simple Bayesian interpretation of the conditional model and its special cases. In addition to a unifying view, the framework suggests a probabilistic interpretation for error correcting output codes and an extension beyond the binary coding scheme.
2 0.1778371 165 nips-2002-Ranking with Large Margin Principle: Two Approaches
Author: empty-author
Abstract: We discuss the problem of ranking k instances with the use of a
3 0.16781457 19 nips-2002-Adapting Codes and Embeddings for Polychotomies
Author: Gunnar Rätsch, Sebastian Mika, Alex J. Smola
Abstract: In this paper we consider formulations of multi-class problems based on a generalized notion of a margin and using output coding. This includes, but is not restricted to, standard multi-class SVM formulations. Differently from many previous approaches we learn the code as well as the embedding function. We illustrate how this can lead to a formulation that allows for solving a wider range of problems with for instance many classes or even “missing classes”. To keep our optimization problems tractable we propose an algorithm capable of solving them using twoclass classifiers, similar in spirit to Boosting.
4 0.15393497 59 nips-2002-Constraint Classification for Multiclass Classification and Ranking
Author: Sariel Har-Peled, Dan Roth, Dav Zimak
Abstract: The constraint classification framework captures many flavors of multiclass classification including winner-take-all multiclass classification, multilabel classification and ranking. We present a meta-algorithm for learning in this framework that learns via a single linear classifier in high dimension. We discuss distribution independent as well as margin-based generalization bounds and present empirical and theoretical evidence showing that constraint classification benefits over existing methods of multiclass classification.
5 0.14202766 149 nips-2002-Multiclass Learning by Probabilistic Embeddings
Author: Ofer Dekel, Yoram Singer
Abstract: We describe a new algorithmic framework for learning multiclass categorization problems. In this framework a multiclass predictor is composed of a pair of embeddings that map both instances and labels into a common space. In this space each instance is assigned the label it is nearest to. We outline and analyze an algorithm, termed Bunching, for learning the pair of embeddings from labeled data. A key construction in the analysis of the algorithm is the notion of probabilistic output codes, a generalization of error correcting output codes (ECOC). Furthermore, the method of multiclass categorization using ECOC is shown to be an instance of Bunching. We demonstrate the advantage of Bunching over ECOC by comparing their performance on numerous categorization problems.
6 0.12579711 69 nips-2002-Discriminative Learning for Label Sequences via Boosting
7 0.081937715 92 nips-2002-FloatBoost Learning for Classification
8 0.070761658 88 nips-2002-Feature Selection and Classification on Matrix Data: From Large Margins to Small Covering Numbers
9 0.064657293 46 nips-2002-Boosting Density Estimation
10 0.05911449 45 nips-2002-Boosted Dyadic Kernel Discriminants
11 0.057157412 120 nips-2002-Kernel Design Using Boosting
12 0.056645907 181 nips-2002-Self Supervised Boosting
13 0.056352802 24 nips-2002-Adaptive Scaling for Feature Selection in SVMs
14 0.052359328 27 nips-2002-An Impossibility Theorem for Clustering
15 0.05163547 72 nips-2002-Dyadic Classification Trees via Structural Risk Minimization
16 0.050928675 68 nips-2002-Discriminative Densities from Maximum Contrast Estimation
17 0.046723131 67 nips-2002-Discriminative Binaural Sound Localization
18 0.046358954 10 nips-2002-A Model for Learning Variance Components of Natural Images
19 0.045617331 118 nips-2002-Kernel-Based Extraction of Slow Features: Complex Cells Learn Disparity and Translation Invariance from Natural Images
20 0.043416813 21 nips-2002-Adaptive Classification by Variational Kalman Filtering
topicId topicWeight
[(0, -0.162), (1, -0.08), (2, 0.036), (3, -0.024), (4, 0.184), (5, -0.006), (6, -0.041), (7, -0.115), (8, -0.032), (9, -0.062), (10, -0.057), (11, -0.026), (12, -0.053), (13, -0.069), (14, -0.071), (15, 0.129), (16, 0.091), (17, 0.057), (18, -0.021), (19, -0.076), (20, 0.017), (21, -0.08), (22, -0.081), (23, 0.123), (24, 0.043), (25, -0.019), (26, 0.048), (27, 0.182), (28, -0.016), (29, 0.06), (30, -0.005), (31, -0.079), (32, 0.08), (33, 0.009), (34, 0.026), (35, -0.021), (36, -0.026), (37, -0.099), (38, 0.047), (39, 0.096), (40, 0.151), (41, -0.02), (42, 0.102), (43, 0.044), (44, -0.121), (45, -0.083), (46, 0.044), (47, -0.058), (48, -0.035), (49, -0.054)]
simIndex simValue paperId paperTitle
same-paper 1 0.92736697 58 nips-2002-Conditional Models on the Ranking Poset
Author: Guy Lebanon, John D. Lafferty
Abstract: A distance-based conditional model on the ranking poset is presented for use in classification and ranking. The model is an extension of the Mallows model, and generalizes the classifier combination methods used by several ensemble learning algorithms, including error correcting output codes, discrete AdaBoost, logistic regression and cranking. The algebraic structure of the ranking poset leads to a simple Bayesian interpretation of the conditional model and its special cases. In addition to a unifying view, the framework suggests a probabilistic interpretation for error correcting output codes and an extension beyond the binary coding scheme.
2 0.66993147 19 nips-2002-Adapting Codes and Embeddings for Polychotomies
Author: Gunnar Rätsch, Sebastian Mika, Alex J. Smola
Abstract: In this paper we consider formulations of multi-class problems based on a generalized notion of a margin and using output coding. This includes, but is not restricted to, standard multi-class SVM formulations. Differently from many previous approaches we learn the code as well as the embedding function. We illustrate how this can lead to a formulation that allows for solving a wider range of problems with for instance many classes or even “missing classes”. To keep our optimization problems tractable we propose an algorithm capable of solving them using twoclass classifiers, similar in spirit to Boosting.
3 0.63897032 149 nips-2002-Multiclass Learning by Probabilistic Embeddings
Author: Ofer Dekel, Yoram Singer
Abstract: We describe a new algorithmic framework for learning multiclass categorization problems. In this framework a multiclass predictor is composed of a pair of embeddings that map both instances and labels into a common space. In this space each instance is assigned the label it is nearest to. We outline and analyze an algorithm, termed Bunching, for learning the pair of embeddings from labeled data. A key construction in the analysis of the algorithm is the notion of probabilistic output codes, a generalization of error correcting output codes (ECOC). Furthermore, the method of multiclass categorization using ECOC is shown to be an instance of Bunching. We demonstrate the advantage of Bunching over ECOC by comparing their performance on numerous categorization problems.
4 0.54119062 59 nips-2002-Constraint Classification for Multiclass Classification and Ranking
Author: Sariel Har-Peled, Dan Roth, Dav Zimak
Abstract: The constraint classification framework captures many flavors of multiclass classification including winner-take-all multiclass classification, multilabel classification and ranking. We present a meta-algorithm for learning in this framework that learns via a single linear classifier in high dimension. We discuss distribution independent as well as margin-based generalization bounds and present empirical and theoretical evidence showing that constraint classification benefits over existing methods of multiclass classification.
5 0.50381881 165 nips-2002-Ranking with Large Margin Principle: Two Approaches
Author: empty-author
Abstract: We discuss the problem of ranking k instances with the use of a
6 0.38833547 140 nips-2002-Margin Analysis of the LVQ Algorithm
7 0.34423238 92 nips-2002-FloatBoost Learning for Classification
8 0.34003979 69 nips-2002-Discriminative Learning for Label Sequences via Boosting
9 0.32505861 67 nips-2002-Discriminative Binaural Sound Localization
10 0.32033107 194 nips-2002-The Decision List Machine
11 0.31771058 70 nips-2002-Distance Metric Learning with Application to Clustering with Side-Information
12 0.29689965 45 nips-2002-Boosted Dyadic Kernel Discriminants
13 0.28396654 46 nips-2002-Boosting Density Estimation
14 0.28188422 55 nips-2002-Combining Features for BCI
15 0.27879384 68 nips-2002-Discriminative Densities from Maximum Contrast Estimation
16 0.27212638 37 nips-2002-Automatic Derivation of Statistical Algorithms: The EM Family and Beyond
17 0.26170534 54 nips-2002-Combining Dimensions and Features in Similarity-Based Representations
18 0.26109087 88 nips-2002-Feature Selection and Classification on Matrix Data: From Large Margins to Small Covering Numbers
19 0.25684777 108 nips-2002-Improving Transfer Rates in Brain Computer Interfacing: A Case Study
20 0.24327952 42 nips-2002-Bias-Optimal Incremental Problem Solving
topicId topicWeight
[(3, 0.011), (23, 0.014), (42, 0.063), (52, 0.385), (54, 0.089), (55, 0.055), (67, 0.01), (68, 0.021), (74, 0.083), (92, 0.042), (98, 0.12)]
simIndex simValue paperId paperTitle
same-paper 1 0.76901579 58 nips-2002-Conditional Models on the Ranking Poset
Author: Guy Lebanon, John D. Lafferty
Abstract: A distance-based conditional model on the ranking poset is presented for use in classification and ranking. The model is an extension of the Mallows model, and generalizes the classifier combination methods used by several ensemble learning algorithms, including error correcting output codes, discrete AdaBoost, logistic regression and cranking. The algebraic structure of the ranking poset leads to a simple Bayesian interpretation of the conditional model and its special cases. In addition to a unifying view, the framework suggests a probabilistic interpretation for error correcting output codes and an extension beyond the binary coding scheme.
2 0.6135298 149 nips-2002-Multiclass Learning by Probabilistic Embeddings
Author: Ofer Dekel, Yoram Singer
Abstract: We describe a new algorithmic framework for learning multiclass categorization problems. In this framework a multiclass predictor is composed of a pair of embeddings that map both instances and labels into a common space. In this space each instance is assigned the label it is nearest to. We outline and analyze an algorithm, termed Bunching, for learning the pair of embeddings from labeled data. A key construction in the analysis of the algorithm is the notion of probabilistic output codes, a generalization of error correcting output codes (ECOC). Furthermore, the method of multiclass categorization using ECOC is shown to be an instance of Bunching. We demonstrate the advantage of Bunching over ECOC by comparing their performance on numerous categorization problems.
3 0.58833444 63 nips-2002-Critical Lines in Symmetry of Mixture Models and its Application to Component Splitting
Author: Kenji Fukumizu, Shotaro Akaho, Shun-ichi Amari
Abstract: We show the existence of critical points as lines for the likelihood function of mixture-type models. They are given by embedding of a critical point for models with less components. A sufficient condition that the critical line gives local maxima or saddle points is also derived. Based on this fact, a component-split method is proposed for a mixture of Gaussian components, and its effectiveness is verified through experiments. 1
4 0.54178238 24 nips-2002-Adaptive Scaling for Feature Selection in SVMs
Author: Yves Grandvalet, Stéphane Canu
Abstract: This paper introduces an algorithm for the automatic relevance determination of input variables in kernelized Support Vector Machines. Relevance is measured by scale factors defining the input space metric, and feature selection is performed by assigning zero weights to irrelevant variables. The metric is automatically tuned by the minimization of the standard SVM empirical risk, where scale factors are added to the usual set of parameters defining the classifier. Feature selection is achieved by constraints encouraging the sparsity of scale factors. The resulting algorithm compares favorably to state-of-the-art feature selection procedures and demonstrates its effectiveness on a demanding facial expression recognition problem.
5 0.44074735 10 nips-2002-A Model for Learning Variance Components of Natural Images
Author: Yan Karklin, Michael S. Lewicki
Abstract: We present a hierarchical Bayesian model for learning efficient codes of higher-order structure in natural images. The model, a non-linear generalization of independent component analysis, replaces the standard assumption of independence for the joint distribution of coefficients with a distribution that is adapted to the variance structure of the coefficients of an efficient image basis. This offers a novel description of higherorder image structure and provides a way to learn coarse-coded, sparsedistributed representations of abstract image properties such as object location, scale, and texture.
6 0.44008201 110 nips-2002-Incremental Gaussian Processes
7 0.43391812 204 nips-2002-VIBES: A Variational Inference Engine for Bayesian Networks
8 0.43336123 2 nips-2002-A Bilinear Model for Sparse Coding
9 0.43214202 68 nips-2002-Discriminative Densities from Maximum Contrast Estimation
10 0.43088686 141 nips-2002-Maximally Informative Dimensions: Analyzing Neural Responses to Natural Signals
11 0.43003774 193 nips-2002-Temporal Coherence, Natural Image Sequences, and the Visual Cortex
12 0.42997342 28 nips-2002-An Information Theoretic Approach to the Functional Classification of Neurons
13 0.42975697 127 nips-2002-Learning Sparse Topographic Representations with Products of Student-t Distributions
14 0.42872012 52 nips-2002-Cluster Kernels for Semi-Supervised Learning
15 0.42821971 46 nips-2002-Boosting Density Estimation
16 0.42784402 53 nips-2002-Clustering with the Fisher Score
17 0.42774522 37 nips-2002-Automatic Derivation of Statistical Algorithms: The EM Family and Beyond
18 0.42773569 31 nips-2002-Application of Variational Bayesian Approach to Speech Recognition
19 0.42763272 189 nips-2002-Stable Fixed Points of Loopy Belief Propagation Are Local Minima of the Bethe Free Energy
20 0.42759329 11 nips-2002-A Model for Real-Time Computation in Generic Neural Microcircuits