nips nips2001 nips2001-22 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: André Elisseeff, Jason Weston
Abstract: This article presents a Support Vector Machine (SVM) like learning system to handle multi-label problems. Such problems are usually decomposed into many two-class problems but the expressive power of such a system can be weak [5, 7]. We explore a new direct approach. It is based on a large margin ranking system that shares a lot of common properties with SVMs. We tested it on a Yeast gene functional classification problem with positive results.
Reference: text
sentIndex sentText sentNum sentScore
1 com ¡ Abstract This article presents a Support Vector Machine (SVM) like learning system to handle multi-label problems. [sent-2, score-0.104]
2 Such problems are usually decomposed into many two-class problems but the expressive power of such a system can be weak [5, 7]. [sent-3, score-0.159]
3 It is based on a large margin ranking system that shares a lot of common properties with SVMs. [sent-5, score-0.779]
4 We tested it on a Yeast gene functional classification problem with positive results. [sent-6, score-0.265]
5 Consider for instance the classification task of determining the subjects of a document, or of relating one protein to its many effects on a cell. [sent-9, score-0.094]
6 In either case, the learning task would be to output a set of labels whose size is not known in advance: one document can for instance be about food, meat and finance, although another one would concern only food and fat. [sent-10, score-0.365]
7 Two-class and multi-class classification or ordinal regression problems can all be cast into multi-label ones. [sent-11, score-0.077]
8 This makes the latter quite attractive but at the same time it gives a warning: their generality hides their difficulty to solve them. [sent-12, score-0.086]
9 The number of publications is not going to contradict this statement: we are aware of only a few works about the subject [4, 5, 7] and they all concern text mining applications. [sent-13, score-0.228]
10 In Schapire and Singer’s work about Boostexter, one of the only general purpose multilabel ranking systems [7], they observe that overfitting occurs on learning sets of relatively ). [sent-14, score-0.735]
11 They conclude that controlling the complexity of the overall learning small size ( system is an important research goal. [sent-15, score-0.22]
12 Defining a cost function (section 2) and margin for multi-label models, we focus our attention mainly on an approach based on a ranking method combined with a predictor of the size of the sets (section 3 and 4). [sent-18, score-0.89]
13 Sections 5 and 6 present experiments on a toy problem and on a real dataset. [sent-19, score-0.106]
14 We consider as an output space the space formed by all the sets of integer between 1 and identified here as the labels of the ¡ learning problem. [sent-21, score-0.152]
15 The learning problem we are interested in is to find from a learning set , drawn identically and independently from an unknown distribution , a function such that the following generalization error is as low as possible: (1) ' & (© £ $¡ %$#"! [sent-23, score-0.125]
16 ¦£ is bias 0 ¢ The function is a real-valued loss and can take different forms depending on how computed. [sent-25, score-0.261]
17 © § #£ V U ¥ T © V U ¥ T ¡ S R© § S © ¥ ¨£ With the binary approach: sign , where the sign function applies component-wise. [sent-31, score-0.198]
18 The value of is a binary vector from which the set of labels can be retrieved easily by stating that label is in the set iff sign . [sent-32, score-0.407]
19 For example this can be achieved by using a SVM for each binary problem and applying the latter rule [4]. [sent-33, score-0.209]
20 £ 0 With the ranking approach: assume that , the size of the label set for the input , is known. [sent-35, score-0.777]
21 We define: and consider that a label is in the label set of iff is among the largest elements . [sent-36, score-0.209]
22 The ranking approach is analyzed more precisely in section 3. [sent-38, score-0.672]
23 £ eQ S ¨¨#© V U ¥ T ¥ ¨£ Y dc ¥ ¨£ Y Wc We consider the same loss functions as in [7] for any multi-label system built from real functions . [sent-41, score-0.349]
24 When a multi-label system is in fact a multi-class one and the Hamming Loss is times the loss of the usual classification loss. [sent-44, score-0.303]
25 Other losses concern only ranking systems (a system that specifies a ranking but no set size predictor ). [sent-46, score-1.559]
26 For ranking systems, this loss is natural and is related to the precision which is a common error measure in Information Retrieval: ¡ s. [sent-52, score-0.96]
27 ¦£ © f5© v i v $W i £ 8 WY i Gr i 0 © ¥ C© £ precision from which a loss can be directly deduced. [sent-59, score-0.321]
28 All these loss functions have been discussed in [7]. [sent-60, score-0.232]
29 Good systems should have a high precision and a low Hamming or Ranking Loss. [sent-61, score-0.089]
30 We do not consider the one-error to be a good loss for multi-label systems but we retain it because it was measured in [7]. [sent-62, score-0.232]
31 For multi-label linear models, we need to define a way of minimizing the empirical error measured by the appropriate loss and at the same time to control the complexity of the resulting model. [sent-63, score-0.269]
32 A direct method would be to use the binary approach and thus take the benefit of good two-class systems. [sent-64, score-0.196]
33 However, as it has been raised in [5, 7], the binary approach does not take into account the correlation between labels and therefore does not capture the structure of some learning problems. [sent-65, score-0.348]
34 We propose here to instead focus on the ranking approach. [sent-66, score-0.639]
35 This will be done by introducing notions of margin and regularization as has been done for the two-class case in the definition of SVMs. [sent-67, score-0.127]
36 For systems that rank the values of , the decision boundaries for are defined by the hyperplanes whose equations are , where belongs to the label sets of and does not. [sent-69, score-0.116]
37 Maximizing the margin on the whole learning set can then be done via the following problem: (3) § Y wQ S ¡ S CU © V ¥ 8 WY $¤¢ # @8 753 ¦¤¢ ¥£ " 9 6 4 ¥£ 2 1 ( & §8 3% '0)' % ¨ 6 (4) x ' X v @© £ © £ ¡ ¡ 6 ! [sent-74, score-0.158]
38 6 § 6 ¢ T Y Q subject to: In the case where the problem is not ill-conditioned (two labels are always co-occurring), the objective function can be replaced by: . [sent-76, score-0.177]
39 The latter can be expressed quite directly by extending the constraints of the previous problems. [sent-80, score-0.086]
40 As for SVMs we approximate the functions by only and this gives the final quadratic optimization problem: (7) (8) (9) 8 £ In the case where the label sets all have a size of we find the same optimization problem as has been derived for multi-class Support Vector Machines [8]. [sent-84, score-0.261]
41 For this reason, we call the solution of this problem a ranking Support Vector Machine (Rank-SVM). [sent-85, score-0.667]
42 We refer the reader to [3] for the dual formluation and to [2] and references therein for more information about kernels and SVMs. [sent-88, score-0.079]
43 Solving a constrained quadratic problem like those we just introduced requires an amount of memory that is quadratic in terms of the learning set size and it is generally solved in computational steps where we have put into the the number of labels. [sent-89, score-0.2]
44 Such a complexity is too high to apply these methods in many real datasets. [sent-90, score-0.083]
45 £ ¤ @£ 4 s i @£ i ¡ ¢@ £ 4 £ ¥¢ 4 £ ¤ 4 Set size prediction So far we have only developed ranking systems. [sent-96, score-0.69]
46 To obtain a complete multi-label system we need to design a set size predictor . [sent-97, score-0.19]
47 A natural way of doing this is to look for inspiration from the binary approach. [sent-98, score-0.134]
48 The latter can indeed be interpreted as a ranking system whose ranks are derived from the real values . [sent-99, score-0.831]
49 The predictor of the set size is then quite simple: is the number of that are greater than . [sent-100, score-0.158]
50 is computed from a threshold value that differentiates labels in the target The function set from others. [sent-101, score-0.188]
51 For the ranking system introduced in the previous section we generalize this idea by designing a function . [sent-102, score-0.71]
52 The remaining problem now is to choose which is done by solving a learning problem. [sent-103, score-0.09]
53 The training data are given by the ranking system, and by the target values composed by the defined by: ¥ ! [sent-104, score-0.639]
54 We refer to this method of predicting the set size as the threshold based method. [sent-113, score-0.093]
55 In the following, we have used linear least squares, and we applied it not only to Rank-SVM but also to Boostexter in order to transform these algorithms from ranking methods to multi-label ones. [sent-114, score-0.639]
56 A naive method would be to consider the set size prediction as a regression problem on the original training data with the targets and to use any regression learning system. [sent-116, score-0.178]
57 This however does not provide a satisfactory solution mainly because it does not take into account how the ranking is performed. [sent-117, score-0.668]
58 In particular, when there are some errors in the ranking, it does not learn how to compensate these errors although the threshold based approach tries to learn the best threshold with respect to these errors. [sent-118, score-0.117]
59 6 § $ i i £ 5 Toy problem As previously noticed the binary approach is not appropriate for problems where correlation between labels exist. [sent-121, score-0.358]
60 The binary approach leads to a system that will fail to separate, for instance, points with label from points of £ £ label sets not containing , that is, on points of label and . [sent-125, score-0.499]
61 We see then that the expressible power of a binary system can be quite low when simple configurations occur. [sent-126, score-0.244]
62 If we , , consider the ranking approach, one can imagine the following solution: is the hyperplane separating class 2 from class 3, and . [sent-127, score-0.639]
63 By taking the number of labels at point to be where and , we have a simple multi-label system that separates all the regions exactly. [sent-128, score-0.19]
64 £ b 1 1,3 ¥ © £ 1,2 ¥¢ I © £ ¥D s s Q S© £ ¥ S Figure 2: Three labels and three regions in the input space. [sent-131, score-0.119]
65 The bottom right region is partitioned into two sub-regions with labels or . [sent-133, score-0.119]
66 £ © £ To make this point more concrete we sampled points uniformly on and solved all optimization problems with . [sent-134, score-0.077]
67 On the learning set the Hamming Loss for the binary approach was although for the direct approach it was as expected. [sent-135, score-0.233]
68 s ¥ £ F ¤ ¥¥ ¥ 6 Experiments on real data Yeast Saccharomyces cerevisiae Metabolism Transcription Energy Protein Synthesis Protein Destination Cell Growth, Cell Division, DNA synthesis Ionic Homeostasis Cell. [sent-136, score-0.083]
69 communication, Signal Transduction Cellular Biogenesis Transposable elements Viral and Plasmid proteins Transport Facilitation YAL041W Figure 3: First level of the hierarchy of the gene functional classes. [sent-139, score-0.237]
70 One gene, for instance the gene YAL041W can belong to different groups (shaded in grey on the figure). [sent-141, score-0.197]
71 The Yeast dataset is formed by micro-array expression data and phylogenetic profiles with 1500 genes in the learning set and 917 in the test set. [sent-142, score-0.203]
72 Each gene is associated with a set of functional classes whose maximum size can be potentially more than . [sent-144, score-0.288]
73 The whole set of classes is indeed structured in a tree whose leaves are the functional categories (see http://mips. [sent-147, score-0.157]
74 Given a gene, knowing which edge to take from one level to another leads directly to a leaf and ¥¦ §§£ ¦£ ¥ thus to a functional class. [sent-150, score-0.101]
75 Since one gene can have many functional classes this is a multi-label problem: one gene is associated to different edges. [sent-152, score-0.402]
76 We then have and the average number of labels for all genes in the learning set is . [sent-153, score-0.226]
77 First as a ranking system with the Ranking Loss and the precision. [sent-155, score-0.71]
78 In that case, for the binary approach, the real outputs of the two-class SVMs were used as ranking values. [sent-156, score-0.819]
79 We computed the latter for the binary approach used in conjunction with SVMs, for the Rank-SVM and for Boostexter. [sent-158, score-0.244]
80 To measure the Hamming Loss with Boostexter we used a threshold based function in combination with the ranking given by the algorithm. [sent-159, score-0.681]
81 Loss functions for the rank-SVM and the binary approach based on two-class SVMs. [sent-164, score-0.167]
82 £ ¥ ¥ For rank-SVMs and for two-class SVMs in the binary approach we choose polynomial kernels of degrees two to nine (experiments on two-class problems using the Yeast data in [6] already showed that polynomial kernels were appropriate for this task). [sent-167, score-0.313]
83 ¦ 00$ # 1 ¦ "0# 43¥ 4 2¦ ¥ ¨ 2¦ 553¥ 6 ¦ ¨ ¥ degree Precision Ranking Loss Hamming Loss one-error Figure 5: Polynomials of degree 6-9. [sent-176, score-0.074]
84 Loss functions for the rank-SVM and the binary approach based on two-class SVMs. [sent-177, score-0.167]
85 Note that these results are worse than with the binary approach or with rank-SVM. [sent-182, score-0.167]
86 Note that Boostexter performs quite poorly on this dataset compared to SVM-based approaches. [sent-183, score-0.072]
87 One of the main advantages of the SVM-based approaches is the ability to incorporate priori knowledge into the kernel and control complexity via the kernel and regularization. [sent-185, score-0.105]
88 To compare the binary and the rank-SVM we put in bold the best results for each kernel. [sent-187, score-0.233]
89 For all kernels and for almost all losses, the combination ranking based SVM approach is better than the binary one. [sent-188, score-0.857]
90 It is consistent with the fact that this system tends to minimize this particular loss function. [sent-190, score-0.303]
91 It is worth noticing that when the kernel becomes more and more complex the difference between rank-SVM and the binary method disappears. [sent-191, score-0.168]
92 7 Discussion and conclusion In this paper we have defined a whole system to deal with multi-label problems. [sent-192, score-0.098]
93 The main contribution is the definition of a ranking based SVM that extends the use of the latter to many problems in the area of Bioinformatics and Text Mining. [sent-193, score-0.73]
94 We have seen on complex, real data that rank-SVMs lead to better performance than Boostexter and the binary approach. [sent-194, score-0.18]
95 However, we can also extend the rank-SVM system to perform feature selection on ranking problems [3] . [sent-196, score-0.754]
96 This application can be very useful in the field of bioinformatics as one is often interested in interpretability of a multilabel decision rule. [sent-197, score-0.192]
97 For example one could be interested in a small set of genes which is discriminative in a multi-condition physical disorder. [sent-198, score-0.105]
98 Text categorization with support vector machines: learning with many relevant features. [sent-221, score-0.072]
99 Multi-label text classification with a mixture model trained by em. [sent-226, score-0.074]
100 Combining microarray expression data and phylogenetic profiles to learn functional categories using support vector machines. [sent-235, score-0.204]
wordName wordTfidf (topN-words)
[('ranking', 0.639), ('boostexter', 0.313), ('loss', 0.232), ('hamming', 0.186), ('gene', 0.165), ('wq', 0.149), ('binary', 0.134), ('labels', 0.119), ('yeast', 0.109), ('precision', 0.089), ('label', 0.087), ('classi', 0.087), ('transport', 0.081), ('weston', 0.081), ('genes', 0.074), ('wy', 0.074), ('text', 0.074), ('functional', 0.072), ('system', 0.071), ('margin', 0.069), ('bold', 0.069), ('bioinformatics', 0.069), ('cu', 0.069), ('eq', 0.069), ('wc', 0.069), ('predictor', 0.068), ('multilabel', 0.063), ('phylogenetic', 0.063), ('protein', 0.062), ('ranked', 0.057), ('svm', 0.055), ('andr', 0.054), ('jason', 0.054), ('svms', 0.053), ('cation', 0.052), ('kernels', 0.051), ('size', 0.051), ('de', 0.051), ('concern', 0.05), ('food', 0.05), ('latter', 0.047), ('real', 0.046), ('problems', 0.044), ('aware', 0.043), ('cellular', 0.043), ('elisseeff', 0.043), ('threshold', 0.042), ('cell', 0.042), ('losses', 0.041), ('polynomials', 0.041), ('biowulf', 0.039), ('quite', 0.039), ('support', 0.039), ('calculations', 0.038), ('complexity', 0.037), ('degree', 0.037), ('synthesis', 0.037), ('iff', 0.035), ('technologies', 0.035), ('kernel', 0.034), ('les', 0.034), ('ab', 0.034), ('nition', 0.034), ('learning', 0.033), ('approach', 0.033), ('regression', 0.033), ('dataset', 0.033), ('optimization', 0.033), ('toy', 0.032), ('follow', 0.032), ('sign', 0.032), ('instance', 0.032), ('schapire', 0.031), ('mining', 0.031), ('interested', 0.031), ('gure', 0.031), ('categories', 0.03), ('put', 0.03), ('document', 0.03), ('pro', 0.03), ('subject', 0.03), ('cantly', 0.03), ('cost', 0.03), ('conjunction', 0.03), ('reasoning', 0.03), ('take', 0.029), ('done', 0.029), ('quadratic', 0.029), ('decision', 0.029), ('dual', 0.028), ('problem', 0.028), ('indeed', 0.028), ('controlling', 0.028), ('whole', 0.027), ('ionic', 0.027), ('broadway', 0.027), ('cai', 0.027), ('chemnitz', 0.027), ('death', 0.027), ('differentiates', 0.027)]
simIndex simValue paperId paperTitle
same-paper 1 1.0000005 22 nips-2001-A kernel method for multi-labelled classification
Author: André Elisseeff, Jason Weston
Abstract: This article presents a Support Vector Machine (SVM) like learning system to handle multi-label problems. Such problems are usually decomposed into many two-class problems but the expressive power of such a system can be weak [5, 7]. We explore a new direct approach. It is based on a large margin ranking system that shares a lot of common properties with SVMs. We tested it on a Yeast gene functional classification problem with positive results.
2 0.31625378 147 nips-2001-Pranking with Ranking
Author: Koby Crammer, Yoram Singer
Abstract: We discuss the problem of ranking instances. In our framework each instance is associated with a rank or a rating, which is an integer from 1 to k. Our goal is to find a rank-prediction rule that assigns each instance a rank which is as close as possible to the instance's true rank. We describe a simple and efficient online algorithm, analyze its performance in the mistake bound model, and prove its correctness. We describe two sets of experiments, with synthetic data and with the EachMovie dataset for collaborative filtering. In the experiments we performed, our algorithm outperforms online algorithms for regression and classification applied to ranking. 1
3 0.15652265 139 nips-2001-Online Learning with Kernels
Author: Jyrki Kivinen, Alex J. Smola, Robert C. Williamson
Abstract: We consider online learning in a Reproducing Kernel Hilbert Space. Our method is computationally efficient and leads to simple algorithms. In particular we derive update equations for classification, regression, and novelty detection. The inclusion of the -trick allows us to give a robust parameterization. Moreover, unlike in batch learning where the -trick only applies to the -insensitive loss function we are able to derive general trimmed-mean types of estimators such as for Huber’s robust loss. ¡
4 0.13492323 81 nips-2001-Generalization Performance of Some Learning Problems in Hilbert Functional Spaces
Author: T. Zhang
Abstract: We investigate the generalization performance of some learning problems in Hilbert functional Spaces. We introduce a notion of convergence of the estimated functional predictor to the best underlying predictor, and obtain an estimate on the rate of the convergence. This estimate allows us to derive generalization bounds on some learning formulations.
5 0.12660585 8 nips-2001-A General Greedy Approximation Algorithm with Applications
Author: T. Zhang
Abstract: Greedy approximation algorithms have been frequently used to obtain sparse solutions to learning problems. In this paper, we present a general greedy algorithm for solving a class of convex optimization problems. We derive a bound on the rate of approximation for this algorithm, and show that our algorithm includes a number of earlier studies as special cases.
6 0.11087739 56 nips-2001-Convolution Kernels for Natural Language
7 0.10708399 190 nips-2001-Thin Junction Trees
8 0.097780116 159 nips-2001-Reducing multiclass to binary by coupling probability estimates
9 0.097221881 137 nips-2001-On the Convergence of Leveraging
10 0.094039626 38 nips-2001-Asymptotic Universality for Learning Curves of Support Vector Machines
11 0.090714492 105 nips-2001-Kernel Machines and Boolean Functions
12 0.088488109 46 nips-2001-Categorization by Learning and Combining Object Parts
13 0.087476373 9 nips-2001-A Generalization of Principal Components Analysis to the Exponential Family
14 0.082188018 149 nips-2001-Probabilistic Abstraction Hierarchies
15 0.079295367 138 nips-2001-On the Generalization Ability of On-Line Learning Algorithms
16 0.078008369 77 nips-2001-Fast and Robust Classification using Asymmetric AdaBoost and a Detector Cascade
17 0.077705048 29 nips-2001-Adaptive Sparseness Using Jeffreys Prior
18 0.076555856 170 nips-2001-Spectral Kernel Methods for Clustering
19 0.076428652 144 nips-2001-Partially labeled classification with Markov random walks
20 0.076313391 50 nips-2001-Classifying Single Trial EEG: Towards Brain Computer Interfacing
topicId topicWeight
[(0, -0.235), (1, 0.131), (2, -0.025), (3, 0.172), (4, 0.062), (5, -0.044), (6, -0.021), (7, 0.047), (8, -0.101), (9, 0.001), (10, -0.017), (11, -0.047), (12, -0.048), (13, 0.076), (14, 0.06), (15, 0.032), (16, 0.05), (17, 0.232), (18, -0.005), (19, 0.073), (20, 0.097), (21, 0.123), (22, 0.174), (23, 0.029), (24, 0.034), (25, -0.167), (26, 0.06), (27, -0.05), (28, 0.095), (29, 0.092), (30, 0.035), (31, -0.143), (32, -0.087), (33, 0.008), (34, 0.114), (35, -0.145), (36, -0.225), (37, -0.091), (38, -0.163), (39, 0.27), (40, 0.033), (41, 0.032), (42, 0.172), (43, 0.156), (44, 0.078), (45, 0.003), (46, -0.16), (47, -0.008), (48, 0.054), (49, -0.029)]
simIndex simValue paperId paperTitle
same-paper 1 0.9478063 22 nips-2001-A kernel method for multi-labelled classification
Author: André Elisseeff, Jason Weston
Abstract: This article presents a Support Vector Machine (SVM) like learning system to handle multi-label problems. Such problems are usually decomposed into many two-class problems but the expressive power of such a system can be weak [5, 7]. We explore a new direct approach. It is based on a large margin ranking system that shares a lot of common properties with SVMs. We tested it on a Yeast gene functional classification problem with positive results.
2 0.81429535 147 nips-2001-Pranking with Ranking
Author: Koby Crammer, Yoram Singer
Abstract: We discuss the problem of ranking instances. In our framework each instance is associated with a rank or a rating, which is an integer from 1 to k. Our goal is to find a rank-prediction rule that assigns each instance a rank which is as close as possible to the instance's true rank. We describe a simple and efficient online algorithm, analyze its performance in the mistake bound model, and prove its correctness. We describe two sets of experiments, with synthetic data and with the EachMovie dataset for collaborative filtering. In the experiments we performed, our algorithm outperforms online algorithms for regression and classification applied to ranking. 1
3 0.45245853 139 nips-2001-Online Learning with Kernels
Author: Jyrki Kivinen, Alex J. Smola, Robert C. Williamson
Abstract: We consider online learning in a Reproducing Kernel Hilbert Space. Our method is computationally efficient and leads to simple algorithms. In particular we derive update equations for classification, regression, and novelty detection. The inclusion of the -trick allows us to give a robust parameterization. Moreover, unlike in batch learning where the -trick only applies to the -insensitive loss function we are able to derive general trimmed-mean types of estimators such as for Huber’s robust loss. ¡
4 0.3771036 159 nips-2001-Reducing multiclass to binary by coupling probability estimates
Author: B. Zadrozny
Abstract: This paper presents a method for obtaining class membership probability estimates for multiclass classification problems by coupling the probability estimates produced by binary classifiers. This is an extension for arbitrary code matrices of a method due to Hastie and Tibshirani for pairwise coupling of probability estimates. Experimental results with Boosted Naive Bayes show that our method produces calibrated class membership probability estimates, while having similar classification accuracy as loss-based decoding, a method for obtaining the most likely class that does not generate probability estimates.
5 0.35837829 149 nips-2001-Probabilistic Abstraction Hierarchies
Author: Eran Segal, Daphne Koller, Dirk Ormoneit
Abstract: Many domains are naturally organized in an abstraction hierarchy or taxonomy, where the instances in “nearby” classes in the taxonomy are similar. In this paper, we provide a general probabilistic framework for clustering data into a set of classes organized as a taxonomy, where each class is associated with a probabilistic model from which the data was generated. The clustering algorithm simultaneously optimizes three things: the assignment of data instances to clusters, the models associated with the clusters, and the structure of the abstraction hierarchy. A unique feature of our approach is that it utilizes global optimization algorithms for both of the last two steps, reducing the sensitivity to noise and the propensity to local maxima that are characteristic of algorithms such as hierarchical agglomerative clustering that only take local steps. We provide a theoretical analysis for our algorithm, showing that it converges to a local maximum of the joint likelihood of model and data. We present experimental results on synthetic data, and on real data in the domains of gene expression and text.
6 0.33846539 182 nips-2001-The Fidelity of Local Ordinal Encoding
7 0.33416194 29 nips-2001-Adaptive Sparseness Using Jeffreys Prior
8 0.32168314 190 nips-2001-Thin Junction Trees
9 0.31516144 101 nips-2001-K-Local Hyperplane and Convex Distance Nearest Neighbor Algorithms
10 0.30670181 104 nips-2001-Kernel Logistic Regression and the Import Vector Machine
11 0.30579299 9 nips-2001-A Generalization of Principal Components Analysis to the Exponential Family
12 0.3053081 137 nips-2001-On the Convergence of Leveraging
13 0.30467743 8 nips-2001-A General Greedy Approximation Algorithm with Applications
14 0.2963241 81 nips-2001-Generalization Performance of Some Learning Problems in Hilbert Functional Spaces
15 0.28786704 67 nips-2001-Efficient Resources Allocation for Markov Decision Processes
16 0.27591577 138 nips-2001-On the Generalization Ability of On-Line Learning Algorithms
17 0.27565163 69 nips-2001-Escaping the Convex Hull with Extrapolated Vector Machines
19 0.27419785 56 nips-2001-Convolution Kernels for Natural Language
20 0.27081102 64 nips-2001-EM-DD: An Improved Multiple-Instance Learning Technique
topicId topicWeight
[(14, 0.078), (17, 0.038), (19, 0.025), (27, 0.096), (30, 0.107), (38, 0.015), (58, 0.24), (59, 0.027), (72, 0.087), (79, 0.029), (83, 0.028), (91, 0.139)]
simIndex simValue paperId paperTitle
same-paper 1 0.84683913 22 nips-2001-A kernel method for multi-labelled classification
Author: André Elisseeff, Jason Weston
Abstract: This article presents a Support Vector Machine (SVM) like learning system to handle multi-label problems. Such problems are usually decomposed into many two-class problems but the expressive power of such a system can be weak [5, 7]. We explore a new direct approach. It is based on a large margin ranking system that shares a lot of common properties with SVMs. We tested it on a Yeast gene functional classification problem with positive results.
2 0.8420223 54 nips-2001-Contextual Modulation of Target Saliency
Author: Antonio Torralba
Abstract: The most popular algorithms for object detection require the use of exhaustive spatial and scale search procedures. In such approaches, an object is defined by means of local features. fu this paper we show that including contextual information in object detection procedures provides an efficient way of cutting down the need for exhaustive search. We present results with real images showing that the proposed scheme is able to accurately predict likely object classes, locations and sizes. 1
3 0.8100822 160 nips-2001-Reinforcement Learning and Time Perception -- a Model of Animal Experiments
Author: Jonathan L. Shapiro, J. Wearden
Abstract: Animal data on delayed-reward conditioning experiments shows a striking property - the data for different time intervals collapses into a single curve when the data is scaled by the time interval. This is called the scalar property of interval timing. Here a simple model of a neural clock is presented and shown to give rise to the scalar property. The model is an accumulator consisting of noisy, linear spiking neurons. It is analytically tractable and contains only three parameters. When coupled with reinforcement learning it simulates peak procedure experiments, producing both the scalar property and the pattern of single trial covariances. 1
4 0.67901653 102 nips-2001-KLD-Sampling: Adaptive Particle Filters
Author: Dieter Fox
Abstract: Over the last years, particle filters have been applied with great success to a variety of state estimation problems. We present a statistical approach to increasing the efficiency of particle filters by adapting the size of sample sets on-the-fly. The key idea of the KLD-sampling method is to bound the approximation error introduced by the sample-based representation of the particle filter. The name KLD-sampling is due to the fact that we measure the approximation error by the Kullback-Leibler distance. Our adaptation approach chooses a small number of samples if the density is focused on a small part of the state space, and it chooses a large number of samples if the state uncertainty is high. Both the implementation and computation overhead of this approach are small. Extensive experiments using mobile robot localization as a test application show that our approach yields drastic improvements over particle filters with fixed sample set sizes and over a previously introduced adaptation technique.
5 0.6760785 63 nips-2001-Dynamic Time-Alignment Kernel in Support Vector Machine
Author: Hiroshi Shimodaira, Ken-ichi Noma, Mitsuru Nakai, Shigeki Sagayama
Abstract: A new class of Support Vector Machine (SVM) that is applicable to sequential-pattern recognition such as speech recognition is developed by incorporating an idea of non-linear time alignment into the kernel function. Since the time-alignment operation of sequential pattern is embedded in the new kernel function, standard SVM training and classification algorithms can be employed without further modifications. The proposed SVM (DTAK-SVM) is evaluated in speaker-dependent speech recognition experiments of hand-segmented phoneme recognition. Preliminary experimental results show comparable recognition performance with hidden Markov models (HMMs). 1
6 0.67226934 56 nips-2001-Convolution Kernels for Natural Language
7 0.67141283 13 nips-2001-A Natural Policy Gradient
8 0.6657117 149 nips-2001-Probabilistic Abstraction Hierarchies
9 0.66404468 185 nips-2001-The Method of Quantum Clustering
10 0.66391802 60 nips-2001-Discriminative Direction for Kernel Classifiers
11 0.66298205 77 nips-2001-Fast and Robust Classification using Asymmetric AdaBoost and a Detector Cascade
12 0.66262364 195 nips-2001-Variance Reduction Techniques for Gradient Estimates in Reinforcement Learning
13 0.66206026 100 nips-2001-Iterative Double Clustering for Unsupervised and Semi-Supervised Learning
14 0.6618554 46 nips-2001-Categorization by Learning and Combining Object Parts
16 0.66155088 131 nips-2001-Neural Implementation of Bayesian Inference in Population Codes
17 0.66039246 161 nips-2001-Reinforcement Learning with Long Short-Term Memory
18 0.66016418 121 nips-2001-Model-Free Least-Squares Policy Iteration
19 0.66001487 29 nips-2001-Adaptive Sparseness Using Jeffreys Prior
20 0.65772611 92 nips-2001-Incorporating Invariances in Non-Linear Support Vector Machines