nips nips2002 nips2002-59 knowledge-graph by maker-knowledge-mining

59 nips-2002-Constraint Classification for Multiclass Classification and Ranking

Source: pdf

Author: Sariel Har-Peled, Dan Roth, Dav Zimak

Abstract: The constraint classiﬁcation framework captures many ﬂavors of multiclass classiﬁcation including winner-take-all multiclass classiﬁcation, multilabel classiﬁcation and ranking. We present a meta-algorithm for learning in this framework that learns via a single linear classiﬁer in high dimension. We discuss distribution independent as well as margin-based generalization bounds and present empirical and theoretical evidence showing that constraint classiﬁcation beneﬁts over existing methods of multiclass classiﬁcation.

Reference: text

Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 edu ¡ Abstract The constraint classiﬁcation framework captures many ﬂavors of multiclass classiﬁcation including winner-take-all multiclass classiﬁcation, multilabel classiﬁcation and ranking. [sent-2, score-1.244]

2 We discuss distribution independent as well as margin-based generalization bounds and present empirical and theoretical evidence showing that constraint classiﬁcation beneﬁts over existing methods of multiclass classiﬁcation. [sent-4, score-0.831]

3 ¢ While binary classiﬁcation is well understood, relatively little is known about multiclass classiﬁcation. [sent-7, score-0.525]

4 Indeed, the most common approach to multiclass classiﬁcation, the oneversus-all (OvA) approach, makes direct use of standard binary classiﬁers to encode and train the output labels. [sent-8, score-0.563]

5 The OvA scheme assumes that for each class there exists a single (simple) separator between that class and all the other classes. [sent-9, score-0.106]

6 Another common approach, all-versus-all (AvA) [HT98], is a more expressive alternative which assumes the existence of a separator between any two classes. [sent-10, score-0.093]

7 OvA classiﬁers are usually implemented using a winner-take-all (WTA) strategy that associates a real-valued function with each class in order to determine class membership. [sent-11, score-0.068]

8 While it is known that WTA is an expressive classiﬁer [Maa00], it has limited expressivity when trained using the OvA assumption since OvA assumes that each class can be easily separated from the rest. [sent-15, score-0.142]

9 This work is motivated by several successful practical approaches, such as multiclass support vector machines (SVMs) and the sparse network of winnows (SNoW) architecture that rely on the WTA strategy over linear functions. [sent-17, score-0.556]

10 An alternative interpretation of WTA is that every example provides an ordering of the classes (sorted in descending order by the assigned values), where the “winner” is the ﬁrst class in this ordering. [sent-19, score-0.118]

11 It is thus natural to specify the ordering of the classes for an example directly, instead of implicitly through WTA. [sent-20, score-0.079]

12 In Section 2, we introduce constraint classiﬁcation, where each example is labeled with a set of constraints relating multiple classes. [sent-21, score-0.253]

13 Each such constraint speciﬁes the relative order of two classes for this example. [sent-22, score-0.252]

14 Learning is made possible by a simple transformation mapping each example into a set of examples (one for each constraint) and the application of any binary classiﬁer on the mapped examples. [sent-24, score-0.165]

15 In Section 3, we present a new algorithm for constraint classiﬁcation that takes on the properties of the binary classiﬁcation algorithm used. [sent-25, score-0.352]

16 Therefore, using the Perceptron algorithm, it is able to learn a consistent classiﬁer if one exists, using the winnow algorithm it can learn attribute efﬁciently, and using the SVM, it provides a simple implementation of multiclass SVM. [sent-26, score-0.643]

17 The algorithm can be implemented with a subtle change to the standard (via OvA) approach to training a network of linear threshold gates. [sent-27, score-0.13]

18 In Section 4, we discuss both VC-dimension and margin-based generalization bounds presented a companion paper[HPRZ02]. [sent-28, score-0.117]

19 Our generalization bounds apply to WTA classiﬁers over linear functions, for which VC-style bounds were not known. [sent-29, score-0.226]

20 In addition to multiclass classiﬁcation, constraint classiﬁcation generalizes multilabel classiﬁcation, ranking on labels, and of course, binary classiﬁcation. [sent-30, score-0.919]

21 For example, in Section , we show that the commonly used OvA assumption can cause learning to fail, even when a consistent classiﬁer exists. [sent-32, score-0.112]

22 Section 5 provides empirical evidence that the constraint classiﬁcation outperforms the OvA approach. [sent-33, score-0.281]

23 1 (Learning) Given examples, from , a hypothesis class and an error function , a learning algorithm attempts to output a function , where , that minimizes the expected error on a randomly drawn example. [sent-37, score-0.176]

24 Similarly, denotes the set of all partial orders sisting of all permutations of over . [sent-41, score-0.104]

25 A partial order, , deﬁnes a binary relation, and can be repholds, . [sent-42, score-0.107]

26 Given two partial orders order produced by the transitive closure of with respect to , is consistent with (denoted ) if for every , holds whenever . [sent-44, score-0.136]

27 If is a full order, then it can be represented by a list of integers where if precedes in the list. [sent-45, score-0.097]

28 3 (Constraint Classiﬁcation) Constraint classiﬁcation is a learning problem where each example is labeled according to a partial order . [sent-48, score-0.1]

29 A constraint classiﬁer, , is consistent with example if is consistent with ( ). [sent-49, score-0.367]

30 4 (Error Indicator Function) For any , and hypothesis , the indicator function indicates an error on example , if , and otherwise. [sent-59, score-0.072]

31 For example, if and example , , and , then is correct since precedes and precedes in the full order whereas is incorrect since precedes in . [sent-60, score-0.319]

32 the case that Constraint classiﬁcation can model many well-studied learning problems including multiclass classiﬁcation, ranking and multilabel classiﬁcation. [sent-74, score-0.692]

33 Table 1 shows a few interesting classiﬁcation problems expressible as constraint classiﬁcation. [sent-75, score-0.258]

34 7 (Problem mappings) All of the learning problems in Table 1 can be expressed as constraint classiﬁcation problems. [sent-77, score-0.286]

35 If we ﬁnd a constraint classiﬁer that correctly labels according to the given constraints where , , and , then . [sent-80, score-0.251]

36 If instead we are given a ranking example , it can be transformed into . [sent-81, score-0.127]

37 ¥65r2 q ¡ £ ¡ ¤¢ 3 Learning In this section, -class constraint classiﬁcation is transformed into binary classiﬁcation in higher dimension. [sent-84, score-0.313]

38 Each example becomes a set of examples in ¢ ¡ r r s1 ¢ W¡ q © § ¥ £ ¡ ¨5j¢ ¤ with each constraint contributing a single ‘positive’ and a single ‘negative’ example. [sent-85, score-0.3]

39 Then, a separating hyperplane for the expanded example set (in ) can be viewed as a linear sorting function over linear functions, each in dimensional space. [sent-86, score-0.437]

40 1 Kesler’s Construction Kesler’s construction for multiclass classiﬁcation was ﬁrst introduced by Nilsson in 1965[Nil65, 75–77] and can also be found more recently[DH73]. [sent-88, score-0.496]

41 This subsection extends the Kesler construction for constraint classiﬁcation. [sent-89, score-0.259]

42 2 (Expansion) Let by writing the coordinates of in the -th chunk of a vector in . [sent-93, score-0.095]

43 Finally, , is the embedding of in the -th chunk and in the -th chunk of a vector in . [sent-96, score-0.19]

44 2 Algorithm Figure 1 (a) shows a meta-learning algorithm for constraint classiﬁcation that ﬁnds a linear sorting function by using any algorithm for learning a binary classiﬁer. [sent-103, score-0.621]

45 Given a set of examples , the algorithm simply ﬁnds a separating hyperplane for . [sent-104, score-0.175]

46 Suppose correctly classiﬁes , , and the constraint on (dictating that then ) is consistent with . [sent-105, score-0.308]

47 Therefore, if correctly classiﬁes all , then is a consistent linear sorting function. [sent-106, score-0.324]

48 First, the hypothesis learned above is more expressive than when the OvA assumption is used. [sent-108, score-0.126]

49 Finally, the multiclass support vector machine can be implemented by learning a hyperplane to separate with maximal margin. [sent-111, score-0.538]

50 is any binary learning algorithm returning a separating hyperplane. [sent-115, score-0.171]

51 (b) Online meta-algorithm for constraint classiﬁcation with linear sorting functions (see Deﬁnition 2. [sent-116, score-0.466]

52 The particular online algorithm used determines how is initialized and the promotion and demotion strategies. [sent-118, score-0.112]

53 Learning proceeds by learning independent binary classiﬁers, one corresponding to each class, where example is considered positive for classiﬁer and negative for all others. [sent-120, score-0.119]

54 q § ¥ £ ¡ ¨5j¢ ¥ It is easy to construct an example where the OvA assumption causes the learning to fail even when there exists a consistent linear sorting function. [sent-121, score-0.381]

55 (see Figure 2) Notice, since the existence of a consistent linear sorting function (w. [sent-122, score-0.298]

56 ) implies the existence of a ), any learning algorithm guaranteed to separate two separating hyperplane (w. [sent-125, score-0.156]

57 the Perceptron algorithm) is guaranteed to ﬁnd a consistent linear sorting function. [sent-130, score-0.298]

58 In Section 5, we use the perceptron algorithm to ﬁnd a consistent classiﬁer for an extension of the example in Figure 2 to when OvA fails. [sent-131, score-0.25]

59 4 Comparison to Newtorks of Linear Threshold Gates (Perceptron) It is possible to implement the algorithm in Section using a network of linear classiﬁers such as multi-output Perceptron [AB99], SNoW [CCRR99, Rot98], and multiclass SVM [CS00, WW99]. [sent-133, score-0.564]

60 Such a network has as input and outputs, each repre, where the -th output computes (see Figure 1 sented by a weight vector, (b)). [sent-134, score-0.077]

61 ¢ W¡ q 1 ¡ © ¡ ¢ £¡ 1 k © P Typically, a label is mapped, via ﬁxed transformation, into a -dimensional output vector, and each output is trained separately, as in the OvA case. [sent-135, score-0.076]

62 Alternately, if the online perceptron algorithm is plugged into the meta-algorithm in Section , then updates are performed according to a dynamic transformation. [sent-136, score-0.21]

63 Speciﬁcally, given , for every constraint , if , is ‘promoted’ and is ‘demoted’. [sent-137, score-0.225]

64 Using a network in this results in an ultraconservative online algorithm for multiclass classiﬁcation [CS01]. [sent-138, score-0.635]

65 This subtle change enables the commonly used network of linear threshold gates to learn every hypothesis it is capable of representing. [sent-139, score-0.171]

66 q § ¥ £ ¡ ¨5j¢ f q 1 ged 1 f i ¡ ¡ ¥ © § ¢ y £ + + − − + f =0 Figure 2: A 3-class classiﬁcation example in ¡ f =0 f =0 − showing that one-versus-all (OvA) does not converge to a consistent hypothesis. [sent-140, score-0.112]

67 The OvA assumption will attempt to separate the circles from squares and triangles with a single separating hyperplane, as well as the other 2 combinations. [sent-143, score-0.11]

68 £ ¤¢ £ ¤¢ Dataset glass vowel soybean audiology ISOLET letter Synthetic* Features 9 10 35 69 617 16 100 Classes 6 11 19 24 26 26 3 Training Examples 214 528 307 200 6238 16000 50000 Testing Examples – 462 376 26 1559 4000 50000 Table 2: Summary of problems from the UCI repository. [sent-147, score-0.197]

69 The synthetic data is sampled from a random linear sorting function (see Section 5). [sent-148, score-0.297]

70 4 Generalization Bounds A PAC-style analysis of multiclass functions that uses an extended notion of VC-dimension for multiclass case [BCHL95] provides poor bounds on generalization for WTA, and the current best bounds rely on a generalized notion of margin [ASS00]. [sent-149, score-1.148]

71 In this section, we prove tighter bounds using the new framework. [sent-150, score-0.078]

72 We seek generalization bounds for learning with , the class of linear sorting functions (Deﬁnition 2. [sent-151, score-0.42]

73 Although both VC-dimension-based (based on growth function) and margin-based bounds for the class of hyperplanes in are known [Vap98, AB99], they cannot directly be applied since produces points that are random, but not independently drawn. [sent-153, score-0.137]

74 It turns out that bounds can be derived indirectly by using known bounds for constraint classiﬁcation. [sent-154, score-0.381]

75 Due to space considerations see[HPRZ02], where natural extensions to the growth function and margin are used to develop generalization bounds. [sent-155, score-0.088]

76 R ¢ ¡ r § 1 v 5 Experiments As in previous multiclass classiﬁcation work [DB95, ASS00], we tested our algorithm on a suite of problems from the Irvine Repository of machine learning [BM98] (see Table 2). [sent-156, score-0.555]

77 All of the results for the constraint classiﬁcation algorithm are competitive with the known. [sent-160, score-0.257]

78 The synthetic data would converge to error using constraint classiﬁcation but would not converge using the OvA approach. [sent-161, score-0.335]

79 A comparison is made between the OvA approach (Section ) and the constraint classiﬁcation approach. [sent-163, score-0.225]

80 Both were implemented on the same network of multi-output Perceptron network with weights (with one threshold per class). [sent-164, score-0.078]

81 § bc ¤ ¡ ¢ £ £ ¢ 2 , a constraint classiﬁcation example ¡ q £ £ HBBBApc a ¢ ¢ £¡ © § VPD 5j¢ ¥ £ ¡ ¢ ¤£h ¡ C q ¡ 2 6 ¢ ¢ £ For each multiclass example ¢ £¡ was created, where . [sent-168, score-0.743]

82 4) of corresponds to the traditional error for multiclass classiﬁcation. [sent-170, score-0.462]

83 " ¥ ¦ D ¥ p ¡ q £ £ HBBBApc 2 y § £ D ¥ ¦IG¢ 2 x ¥ r s1 © V§ x 5j¢ ¥ £ ¡ § x ¦¤¢ ¥ £ ¡ Figure 3 shows that constraint classiﬁcation outperforms the multioutput Perceptron when using the OvA assumption. [sent-171, score-0.225]

84 6 Discussion We think constraint classiﬁcation provides two signiﬁcant contributions to multiclass classiﬁcation. [sent-172, score-0.742]

85 Firstly, it provides a conceptual generalization that encompasses multiclass classiﬁcation, multilabel classiﬁcation, and label ranking problems in addition to problems with more complex relationships between labels. [sent-173, score-0.765]

86 Secondly, it reminds the community that the Kesler construction can be used to extend any learning algorithm for binary classiﬁcation to the multiclass (or constraint) setting. [sent-174, score-0.619]

87 Section 5 showed that the constraint approach to learning is advantageous over the oneversus-all approach on both real-world and synthetic data sets. [sent-175, score-0.309]

88 However, preliminary experiments using various natural language data sets, such as part-of-speech tagging, do not yield any signiﬁcant difference between the two approaches. [sent-176, score-0.064]

89 We used a common transformation [EZR01] to convert raw data to approximately three million examples in one hundred thousand dimensional boolean feature space. [sent-177, score-0.074]

90 Because the constraint approach is more expressive than the one-versus-all approach, and because both approaches use the same hypothesis space ( linear functions), we expected the constraint approach to achieve higher accuracy. [sent-179, score-0.58]

91 Again, we think this is not the case, since it seems that “most” random winner-take-all problems (as with the synthetic data) would cause the one-versus-all assumption to fail. [sent-183, score-0.142]

92 q Rather, we conjecture that for some reason, natural language problems (along with the transformation) are suited to the one-versus-all approach and do not require a more complex hypothesis. [sent-184, score-0.097]

93 7 Conclusions The view of multiclass classiﬁcation presented here simpliﬁes the implementation, analysis, and understanding of many preexisting approaches. [sent-186, score-0.462]

94 Multiclass support vector machines, ultraconservative online algorithms, and traditional one-versus-all approaches can be cast in this framework. [sent-187, score-0.102]

95 It would be interesting to see if it could be combined with the error-correcting output coding method in [DB95] that provides another way to extend the OvA approach. [sent-188, score-0.067]

96 Furthermore, this view allows for a very natural extension of multiclass classiﬁcation to constraint classiﬁcation – capturing within it complex learning tasks such as multilabel classiﬁcation and ranking. [sent-189, score-0.834]

97 Because constraint classiﬁcation is a very intuitive approach and its implementation can be carried out by any discriminant technique, and not only by optimization techniques, we think it will have useful real-world applications. [sent-190, score-0.251]

98 Reducing multiclass to binary: A unifying approach for margin classiﬁers. [sent-208, score-0.462]

99 On the learnability and design of output codes for multiclass problems. [sent-245, score-0.529]

100 Learning to resolve natural language ambiguities: A uniﬁed approach. [sent-319, score-0.064]

similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('multiclass', 0.462), ('ova', 0.418), ('classi', 0.402), ('constraint', 0.225), ('sorting', 0.21), ('cation', 0.167), ('wta', 0.139), ('perceptron', 0.133), ('abba', 0.114), ('hbbbapc', 0.109), ('kesler', 0.109), ('precedes', 0.097), ('multilabel', 0.095), ('chunk', 0.095), ('er', 0.095), ('jg', 0.087), ('bounds', 0.078), ('nition', 0.078), ('bgc', 0.076), ('ranking', 0.074), ('abab', 0.066), ('babgc', 0.066), ('binary', 0.063), ('snow', 0.057), ('kh', 0.057), ('ultraconservative', 0.057), ('consistent', 0.057), ('synthetic', 0.056), ('expressive', 0.055), ('pb', 0.048), ('separating', 0.048), ('hyperplane', 0.048), ('examples', 0.047), ('vv', 0.046), ('online', 0.045), ('partial', 0.044), ('hypothesis', 0.044), ('abbgc', 0.044), ('audiology', 0.044), ('gec', 0.044), ('ged', 0.044), ('lass', 0.044), ('lbd', 0.044), ('nput', 0.044), ('utput', 0.044), ('expanded', 0.041), ('language', 0.04), ('network', 0.039), ('generalization', 0.039), ('ers', 0.039), ('separator', 0.038), ('isolet', 0.038), ('soybean', 0.038), ('winnow', 0.038), ('wsu', 0.038), ('output', 0.038), ('orders', 0.035), ('triangles', 0.035), ('earn', 0.035), ('promotion', 0.035), ('tagging', 0.035), ('class', 0.034), ('construction', 0.034), ('problems', 0.033), ('vowel', 0.032), ('winner', 0.032), ('algorithm', 0.032), ('linear', 0.031), ('provides', 0.029), ('learnability', 0.029), ('gates', 0.029), ('learning', 0.028), ('example', 0.028), ('subtle', 0.028), ('crammer', 0.028), ('roth', 0.028), ('converge', 0.027), ('classes', 0.027), ('expansion', 0.027), ('transformation', 0.027), ('assumption', 0.027), ('de', 0.027), ('empirical', 0.027), ('pages', 0.027), ('correctly', 0.026), ('think', 0.026), ('separated', 0.026), ('permutation', 0.026), ('letter', 0.026), ('hc', 0.026), ('transformed', 0.025), ('hypotheses', 0.025), ('growth', 0.025), ('permutations', 0.025), ('attribute', 0.025), ('section', 0.024), ('machines', 0.024), ('natural', 0.024), ('glass', 0.024), ('verify', 0.024)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.99999982 59 nips-2002-Constraint Classification for Multiclass Classification and Ranking

Author: Sariel Har-Peled, Dan Roth, Dav Zimak

2 0.2537753 19 nips-2002-Adapting Codes and Embeddings for Polychotomies

Author: Gunnar Rätsch, Sebastian Mika, Alex J. Smola

Abstract: In this paper we consider formulations of multi-class problems based on a generalized notion of a margin and using output coding. This includes, but is not restricted to, standard multi-class SVM formulations. Differently from many previous approaches we learn the code as well as the embedding function. We illustrate how this can lead to a formulation that allows for solving a wider range of problems with for instance many classes or even “missing classes”. To keep our optimization problems tractable we propose an algorithm capable of solving them using twoclass classiﬁers, similar in spirit to Boosting.

3 0.23606341 149 nips-2002-Multiclass Learning by Probabilistic Embeddings

Author: Ofer Dekel, Yoram Singer

Abstract: We describe a new algorithmic framework for learning multiclass categorization problems. In this framework a multiclass predictor is composed of a pair of embeddings that map both instances and labels into a common space. In this space each instance is assigned the label it is nearest to. We outline and analyze an algorithm, termed Bunching, for learning the pair of embeddings from labeled data. A key construction in the analysis of the algorithm is the notion of probabilistic output codes, a generalization of error correcting output codes (ECOC). Furthermore, the method of multiclass categorization using ECOC is shown to be an instance of Bunching. We demonstrate the advantage of Bunching over ECOC by comparing their performance on numerous categorization problems.

4 0.2096979 88 nips-2002-Feature Selection and Classification on Matrix Data: From Large Margins to Small Covering Numbers

Author: Sepp Hochreiter, Klaus Obermayer

Abstract: We investigate the problem of learning a classiﬁcation task for datasets which are described by matrices. Rows and columns of these matrices correspond to objects, where row and column objects may belong to diﬀerent sets, and the entries in the matrix express the relationships between them. We interpret the matrix elements as being produced by an unknown kernel which operates on object pairs and we show that - under mild assumptions - these kernels correspond to dot products in some (unknown) feature space. Minimizing a bound for the generalization error of a linear classiﬁer which has been obtained using covering numbers we derive an objective function for model selection according to the principle of structural risk minimization. The new objective function has the advantage that it allows the analysis of matrices which are not positive deﬁnite, and not even symmetric or square. We then consider the case that row objects are interpreted as features. We suggest an additional constraint, which imposes sparseness on the row objects and show, that the method can then be used for feature selection. Finally, we apply this method to data obtained from DNA microarrays, where “column” objects correspond to samples, “row” objects correspond to genes and matrix elements correspond to expression levels. Benchmarks are conducted using standard one-gene classiﬁcation and support vector machines and K-nearest neighbors after standard feature selection. Our new method extracts a sparse set of genes and provides superior classiﬁcation results. 1

5 0.20253606 67 nips-2002-Discriminative Binaural Sound Localization

Author: Ehud Ben-reuven, Yoram Singer

Abstract: Time difference of arrival (TDOA) is commonly used to estimate the azimuth of a source in a microphone array. The most common methods to estimate TDOA are based on ﬁnding extrema in generalized crosscorrelation waveforms. In this paper we apply microphone array techniques to a manikin head. By considering the entire cross-correlation waveform we achieve azimuth prediction accuracy that exceeds extrema locating methods. We do so by quantizing the azimuthal angle and treating the prediction problem as a multiclass categorization task. We demonstrate the merits of our approach by evaluating the various approaches on Sony’s AIBO robot.

6 0.16744295 45 nips-2002-Boosted Dyadic Kernel Discriminants

7 0.16336094 92 nips-2002-FloatBoost Learning for Classification

8 0.15393497 58 nips-2002-Conditional Models on the Ranking Poset

9 0.15370266 68 nips-2002-Discriminative Densities from Maximum Contrast Estimation

10 0.13651419 24 nips-2002-Adaptive Scaling for Feature Selection in SVMs

11 0.12761904 72 nips-2002-Dyadic Classification Trees via Structural Risk Minimization

12 0.1179772 108 nips-2002-Improving Transfer Rates in Brain Computer Interfacing: A Case Study

13 0.11760385 196 nips-2002-The RA Scanner: Prediction of Rheumatoid Joint Inflammation Based on Laser Imaging

14 0.11360613 55 nips-2002-Combining Features for BCI

15 0.11050989 51 nips-2002-Classifying Patterns of Visual Motion - a Neuromorphic Approach

16 0.10755922 21 nips-2002-Adaptive Classification by Variational Kalman Filtering

17 0.10678634 120 nips-2002-Kernel Design Using Boosting

18 0.10571525 86 nips-2002-Fast Sparse Gaussian Process Methods: The Informative Vector Machine

19 0.10384692 109 nips-2002-Improving a Page Classifier with Anchor Extraction and Link Analysis

20 0.092116341 140 nips-2002-Margin Analysis of the LVQ Algorithm

similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, -0.238), (1, -0.158), (2, 0.123), (3, -0.057), (4, 0.361), (5, -0.022), (6, -0.055), (7, -0.196), (8, 0.036), (9, -0.025), (10, -0.113), (11, 0.101), (12, -0.03), (13, -0.068), (14, 0.153), (15, -0.002), (16, 0.046), (17, 0.126), (18, -0.082), (19, -0.09), (20, 0.018), (21, -0.053), (22, -0.11), (23, 0.119), (24, 0.053), (25, 0.022), (26, -0.063), (27, 0.111), (28, -0.055), (29, 0.054), (30, -0.011), (31, -0.022), (32, 0.107), (33, -0.052), (34, -0.006), (35, -0.011), (36, -0.03), (37, -0.005), (38, 0.021), (39, 0.019), (40, 0.039), (41, 0.022), (42, -0.035), (43, 0.013), (44, 0.032), (45, -0.008), (46, -0.056), (47, -0.068), (48, 0.026), (49, -0.044)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.96345478 59 nips-2002-Constraint Classification for Multiclass Classification and Ranking

Author: Sariel Har-Peled, Dan Roth, Dav Zimak

2 0.8094784 149 nips-2002-Multiclass Learning by Probabilistic Embeddings

Author: Ofer Dekel, Yoram Singer

3 0.75596422 58 nips-2002-Conditional Models on the Ranking Poset

Author: Guy Lebanon, John D. Lafferty

Abstract: A distance-based conditional model on the ranking poset is presented for use in classiﬁcation and ranking. The model is an extension of the Mallows model, and generalizes the classiﬁer combination methods used by several ensemble learning algorithms, including error correcting output codes, discrete AdaBoost, logistic regression and cranking. The algebraic structure of the ranking poset leads to a simple Bayesian interpretation of the conditional model and its special cases. In addition to a unifying view, the framework suggests a probabilistic interpretation for error correcting output codes and an extension beyond the binary coding scheme.

4 0.7317096 19 nips-2002-Adapting Codes and Embeddings for Polychotomies

Author: Gunnar Rätsch, Sebastian Mika, Alex J. Smola

5 0.70830566 45 nips-2002-Boosted Dyadic Kernel Discriminants

Author: Baback Moghaddam, Gregory Shakhnarovich

Abstract: We introduce a novel learning algorithm for binary classiﬁcation with hyperplane discriminants based on pairs of training points from opposite classes (dyadic hypercuts). This algorithm is further extended to nonlinear discriminants using kernel functions satisfying Mercer’s conditions. An ensemble of simple dyadic hypercuts is learned incrementally by means of a conﬁdence-rated version of AdaBoost, which provides a sound strategy for searching through the ﬁnite set of hypercut hypotheses. In experiments with real-world datasets from the UCI repository, the generalization performance of the hypercut classiﬁers was found to be comparable to that of SVMs and k-NN classiﬁers. Furthermore, the computational cost of classiﬁcation (at run time) was found to be similar to, or better than, that of SVM. Similarly to SVMs, boosted dyadic kernel discriminants tend to maximize the margin (via AdaBoost). In contrast to SVMs, however, we oﬀer an on-line and incremental learning machine for building kernel discriminants whose complexity (number of kernel evaluations) can be directly controlled (traded oﬀ for accuracy). 1

6 0.69348902 108 nips-2002-Improving Transfer Rates in Brain Computer Interfacing: A Case Study

7 0.68585384 67 nips-2002-Discriminative Binaural Sound Localization

8 0.65804541 92 nips-2002-FloatBoost Learning for Classification

9 0.64709926 68 nips-2002-Discriminative Densities from Maximum Contrast Estimation

10 0.64406002 72 nips-2002-Dyadic Classification Trees via Structural Risk Minimization

11 0.64157534 88 nips-2002-Feature Selection and Classification on Matrix Data: From Large Margins to Small Covering Numbers

12 0.62719238 196 nips-2002-The RA Scanner: Prediction of Rheumatoid Joint Inflammation Based on Laser Imaging

13 0.61496633 55 nips-2002-Combining Features for BCI

14 0.58154786 109 nips-2002-Improving a Page Classifier with Anchor Extraction and Link Analysis

15 0.43396181 62 nips-2002-Coulomb Classifiers: Generalizing Support Vector Machines via an Analogy to Electrostatic Systems

16 0.42870453 173 nips-2002-Recovering Intrinsic Images from a Single Image

17 0.41300195 140 nips-2002-Margin Analysis of the LVQ Algorithm

18 0.40939057 24 nips-2002-Adaptive Scaling for Feature Selection in SVMs

19 0.40512574 51 nips-2002-Classifying Patterns of Visual Motion - a Neuromorphic Approach

20 0.39327416 86 nips-2002-Fast Sparse Gaussian Process Methods: The Informative Vector Machine

similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(1, 0.015), (11, 0.011), (41, 0.013), (42, 0.052), (52, 0.025), (54, 0.136), (55, 0.036), (67, 0.017), (68, 0.017), (74, 0.059), (87, 0.01), (92, 0.039), (98, 0.46)]

similar papers list:

simIndex simValue paperId paperTitle

1 0.99482954 103 nips-2002-How Linear are Auditory Cortical Responses?

Author: Maneesh Sahani, Jennifer F. Linden

Abstract: By comparison to some other sensory cortices, the functional properties of cells in the primary auditory cortex are not yet well understood. Recent attempts to obtain a generalized description of auditory cortical responses have often relied upon characterization of the spectrotemporal receptive ﬁeld (STRF), which amounts to a model of the stimulusresponse function (SRF) that is linear in the spectrogram of the stimulus. How well can such a model account for neural responses at the very ﬁrst stages of auditory cortical processing? To answer this question, we develop a novel methodology for evaluating the fraction of stimulus-related response power in a population that can be captured by a given type of SRF model. We use this technique to show that, in the thalamo-recipient layers of primary auditory cortex, STRF models account for no more than 40% of the stimulus-related power in neural responses.

2 0.99177229 86 nips-2002-Fast Sparse Gaussian Process Methods: The Informative Vector Machine

Author: Ralf Herbrich, Neil D. Lawrence, Matthias Seeger

Abstract: We present a framework for sparse Gaussian process (GP) methods which uses forward selection with criteria based on informationtheoretic principles, previously suggested for active learning. Our goal is not only to learn d–sparse predictors (which can be evaluated in O(d) rather than O(n), d n, n the number of training points), but also to perform training under strong restrictions on time and memory requirements. The scaling of our method is at most O(n · d2 ), and in large real-world classiﬁcation experiments we show that it can match prediction performance of the popular support vector machine (SVM), yet can be signiﬁcantly faster in training. In contrast to the SVM, our approximation produces estimates of predictive probabilities (‘error bars’), allows for Bayesian model selection and is less complex in implementation. 1

3 0.98921448 129 nips-2002-Learning in Spiking Neural Assemblies

Author: David Barber

Abstract: We consider a statistical framework for learning in a class of networks of spiking neurons. Our aim is to show how optimal local learning rules can be readily derived once the neural dynamics and desired functionality of the neural assembly have been speciﬁed, in contrast to other models which assume (sub-optimal) learning rules. Within this framework we derive local rules for learning temporal sequences in a model of spiking neurons and demonstrate its superior performance to correlation (Hebbian) based approaches. We further show how to include mechanisms such as synaptic depression and outline how the framework is readily extensible to learning in networks of highly complex spiking neurons. A stochastic quantal vesicle release mechanism is considered and implications on the complexity of learning discussed. 1

4 0.98844677 56 nips-2002-Concentration Inequalities for the Missing Mass and for Histogram Rule Error

Author: Luis E. Ortiz, David A. McAllester

Abstract: This paper gives distribution-free concentration inequalities for the missing mass and the error rate of histogram rules. Negative association methods can be used to reduce these concentration problems to concentration questions about independent sums. Although the sums are independent, they are highly heterogeneous. Such highly heterogeneous independent sums cannot be analyzed using standard concentration inequalities such as Hoeffding’s inequality, the Angluin-Valiant bound, Bernstein’s inequality, Bennett’s inequality, or McDiarmid’s theorem.

5 0.98587346 92 nips-2002-FloatBoost Learning for Classification

Author: Stan Z. Li, Zhenqiu Zhang, Heung-yeung Shum, Hongjiang Zhang

Abstract: AdaBoost [3] minimizes an upper error bound which is an exponential function of the margin on the training set [14]. However, the ultimate goal in applications of pattern classiﬁcation is always minimum error rate. On the other hand, AdaBoost needs an effective procedure for learning weak classiﬁers, which by itself is difﬁcult especially for high dimensional data. In this paper, we present a novel procedure, called FloatBoost, for learning a better boosted classiﬁer. FloatBoost uses a backtrack mechanism after each iteration of AdaBoost to remove weak classiﬁers which cause higher error rates. The resulting ﬂoat-boosted classiﬁer consists of fewer weak classiﬁers yet achieves lower error rates than AdaBoost in both training and test. We also propose a statistical model for learning weak classiﬁers, based on a stagewise approximation of the posterior using an overcomplete set of scalar features. Experimental comparisons of FloatBoost and AdaBoost are provided through a difﬁcult classiﬁcation problem, face detection, where the goal is to learn from training examples a highly nonlinear classiﬁer to differentiate between face and nonface patterns in a high dimensional space. The results clearly demonstrate the promises made by FloatBoost over AdaBoost.

same-paper 6 0.96507406 59 nips-2002-Constraint Classification for Multiclass Classification and Ranking

7 0.92047858 79 nips-2002-Evidence Optimization Techniques for Estimating Stimulus-Response Functions

8 0.87644517 50 nips-2002-Circuit Model of Short-Term Synaptic Dynamics

9 0.87378919 184 nips-2002-Spectro-Temporal Receptive Fields of Subthreshold Responses in Auditory Cortex

10 0.86902505 110 nips-2002-Incremental Gaussian Processes

11 0.8515709 43 nips-2002-Binary Coding in Auditory Cortex

12 0.84948468 102 nips-2002-Hidden Markov Model of Cortical Synaptic Plasticity: Derivation of the Learning Rule

13 0.83985353 41 nips-2002-Bayesian Monte Carlo

14 0.83417273 12 nips-2002-A Neural Edge-Detection Model for Enhanced Auditory Sensitivity in Modulated Noise

15 0.83044499 116 nips-2002-Interpreting Neural Response Variability as Monte Carlo Sampling of the Posterior

16 0.81803358 199 nips-2002-Timing and Partial Observability in the Dopamine System

17 0.81765544 180 nips-2002-Selectivity and Metaplasticity in a Unified Calcium-Dependent Model

18 0.8164928 7 nips-2002-A Hierarchical Bayesian Markovian Model for Motifs in Biopolymer Sequences

19 0.81522167 81 nips-2002-Expected and Unexpected Uncertainty: ACh and NE in the Neocortex

20 0.81121922 45 nips-2002-Boosted Dyadic Kernel Discriminants