nips nips2001 nips2001-22 knowledge-graph by maker-knowledge-mining

22 nips-2001-A kernel method for multi-labelled classification

Source: pdf

Author: André Elisseeff, Jason Weston

Abstract: This article presents a Support Vector Machine (SVM) like learning system to handle multi-label problems. Such problems are usually decomposed into many two-class problems but the expressive power of such a system can be weak [5, 7]. We explore a new direct approach. It is based on a large margin ranking system that shares a lot of common properties with SVMs. We tested it on a Yeast gene functional classiﬁcation problem with positive results.

Reference: text

Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 com ¡ Abstract This article presents a Support Vector Machine (SVM) like learning system to handle multi-label problems. [sent-2, score-0.104]

2 Such problems are usually decomposed into many two-class problems but the expressive power of such a system can be weak [5, 7]. [sent-3, score-0.159]

3 It is based on a large margin ranking system that shares a lot of common properties with SVMs. [sent-5, score-0.779]

4 We tested it on a Yeast gene functional classiﬁcation problem with positive results. [sent-6, score-0.265]

5 Consider for instance the classiﬁcation task of determining the subjects of a document, or of relating one protein to its many effects on a cell. [sent-9, score-0.094]

6 In either case, the learning task would be to output a set of labels whose size is not known in advance: one document can for instance be about food, meat and ﬁnance, although another one would concern only food and fat. [sent-10, score-0.365]

7 Two-class and multi-class classiﬁcation or ordinal regression problems can all be cast into multi-label ones. [sent-11, score-0.077]

8 This makes the latter quite attractive but at the same time it gives a warning: their generality hides their difﬁculty to solve them. [sent-12, score-0.086]

9 The number of publications is not going to contradict this statement: we are aware of only a few works about the subject [4, 5, 7] and they all concern text mining applications. [sent-13, score-0.228]

10 In Schapire and Singer’s work about Boostexter, one of the only general purpose multilabel ranking systems [7], they observe that overﬁtting occurs on learning sets of relatively ). [sent-14, score-0.735]

11 They conclude that controlling the complexity of the overall learning small size ( system is an important research goal. [sent-15, score-0.22]

12 Deﬁning a cost function (section 2) and margin for multi-label models, we focus our attention mainly on an approach based on a ranking method combined with a predictor of the size of the sets (section 3 and 4). [sent-18, score-0.89]

13 Sections 5 and 6 present experiments on a toy problem and on a real dataset. [sent-19, score-0.106]

14 We consider as an output space the space formed by all the sets of integer between 1 and identiﬁed here as the labels of the ¡ learning problem. [sent-21, score-0.152]

15 The learning problem we are interested in is to ﬁnd from a learning set , drawn identically and independently from an unknown distribution , a function such that the following generalization error is as low as possible: (1) ' & (© £ $¡ %$#"! [sent-23, score-0.125]

16 ¦£ is bias 0 ¢ The function is a real-valued loss and can take different forms depending on how computed. [sent-25, score-0.261]

17 © § #£ V U ¥ T © V U ¥ T ¡ S R© § S © ¥ ¨£ With the binary approach: sign , where the sign function applies component-wise. [sent-31, score-0.198]

18 The value of is a binary vector from which the set of labels can be retrieved easily by stating that label is in the set iff sign . [sent-32, score-0.407]

19 For example this can be achieved by using a SVM for each binary problem and applying the latter rule [4]. [sent-33, score-0.209]

20 £ 0 With the ranking approach: assume that , the size of the label set for the input , is known. [sent-35, score-0.777]

21 We deﬁne: and consider that a label is in the label set of iff is among the largest elements . [sent-36, score-0.209]

22 The ranking approach is analyzed more precisely in section 3. [sent-38, score-0.672]

23 £ eQ S ¨¨#© V U ¥ T ¥ ¨£ Y dc ¥ ¨£ Y Wc We consider the same loss functions as in [7] for any multi-label system built from real functions . [sent-41, score-0.349]

24 When a multi-label system is in fact a multi-class one and the Hamming Loss is times the loss of the usual classiﬁcation loss. [sent-44, score-0.303]

25 Other losses concern only ranking systems (a system that speciﬁes a ranking but no set size predictor ). [sent-46, score-1.559]

26 For ranking systems, this loss is natural and is related to the precision which is a common error measure in Information Retrieval: ¡ s. [sent-52, score-0.96]

27 ¦£ © f5© v i v $W i £ 8 WY i Gr i 0 © ¥ C© £ precision from which a loss can be directly deduced. [sent-59, score-0.321]

28 All these loss functions have been discussed in [7]. [sent-60, score-0.232]

29 Good systems should have a high precision and a low Hamming or Ranking Loss. [sent-61, score-0.089]

30 We do not consider the one-error to be a good loss for multi-label systems but we retain it because it was measured in [7]. [sent-62, score-0.232]

31 For multi-label linear models, we need to deﬁne a way of minimizing the empirical error measured by the appropriate loss and at the same time to control the complexity of the resulting model. [sent-63, score-0.269]

32 A direct method would be to use the binary approach and thus take the beneﬁt of good two-class systems. [sent-64, score-0.196]

33 However, as it has been raised in [5, 7], the binary approach does not take into account the correlation between labels and therefore does not capture the structure of some learning problems. [sent-65, score-0.348]

34 We propose here to instead focus on the ranking approach. [sent-66, score-0.639]

35 This will be done by introducing notions of margin and regularization as has been done for the two-class case in the deﬁnition of SVMs. [sent-67, score-0.127]

36 For systems that rank the values of , the decision boundaries for are deﬁned by the hyperplanes whose equations are , where belongs to the label sets of and does not. [sent-69, score-0.116]

37 Maximizing the margin on the whole learning set can then be done via the following problem: (3) § Y wQ S ¡ S CU © V ¥ 8 WY $¤¢ # @8 753 ¦¤¢ ¥£ " 9 6 4 ¥£ 2 1 ( & §8 3% '0)' % ¨ 6 (4) x ' X v @© £ © £ ¡ ¡ 6 ! [sent-74, score-0.158]

38 6 § 6 ¢ T Y Q subject to: In the case where the problem is not ill-conditioned (two labels are always co-occurring), the objective function can be replaced by: . [sent-76, score-0.177]

39 The latter can be expressed quite directly by extending the constraints of the previous problems. [sent-80, score-0.086]

40 As for SVMs we approximate the functions by only and this gives the ﬁnal quadratic optimization problem: (7) (8) (9) 8 £ In the case where the label sets all have a size of we ﬁnd the same optimization problem as has been derived for multi-class Support Vector Machines [8]. [sent-84, score-0.261]

41 For this reason, we call the solution of this problem a ranking Support Vector Machine (Rank-SVM). [sent-85, score-0.667]

42 We refer the reader to [3] for the dual formluation and to [2] and references therein for more information about kernels and SVMs. [sent-88, score-0.079]

43 Solving a constrained quadratic problem like those we just introduced requires an amount of memory that is quadratic in terms of the learning set size and it is generally solved in computational steps where we have put into the the number of labels. [sent-89, score-0.2]

44 Such a complexity is too high to apply these methods in many real datasets. [sent-90, score-0.083]

45 £ ¤ @£ 4 s i @£ i ¡ ¢@ £ 4 £ ¥¢ 4 £ ¤ 4 Set size prediction So far we have only developed ranking systems. [sent-96, score-0.69]

46 To obtain a complete multi-label system we need to design a set size predictor . [sent-97, score-0.19]

47 A natural way of doing this is to look for inspiration from the binary approach. [sent-98, score-0.134]

48 The latter can indeed be interpreted as a ranking system whose ranks are derived from the real values . [sent-99, score-0.831]

49 The predictor of the set size is then quite simple: is the number of that are greater than . [sent-100, score-0.158]

50 is computed from a threshold value that differentiates labels in the target The function set from others. [sent-101, score-0.188]

51 For the ranking system introduced in the previous section we generalize this idea by designing a function . [sent-102, score-0.71]

52 The remaining problem now is to choose which is done by solving a learning problem. [sent-103, score-0.09]

53 The training data are given by the ranking system, and by the target values composed by the deﬁned by: ¥ ! [sent-104, score-0.639]

54 We refer to this method of predicting the set size as the threshold based method. [sent-113, score-0.093]

55 In the following, we have used linear least squares, and we applied it not only to Rank-SVM but also to Boostexter in order to transform these algorithms from ranking methods to multi-label ones. [sent-114, score-0.639]

56 A naive method would be to consider the set size prediction as a regression problem on the original training data with the targets and to use any regression learning system. [sent-116, score-0.178]

57 This however does not provide a satisfactory solution mainly because it does not take into account how the ranking is performed. [sent-117, score-0.668]

58 In particular, when there are some errors in the ranking, it does not learn how to compensate these errors although the threshold based approach tries to learn the best threshold with respect to these errors. [sent-118, score-0.117]

59 6 § $ i i £ 5 Toy problem As previously noticed the binary approach is not appropriate for problems where correlation between labels exist. [sent-121, score-0.358]

60 The binary approach leads to a system that will fail to separate, for instance, points with label from points of £ £ label sets not containing , that is, on points of label and . [sent-125, score-0.499]

61 We see then that the expressible power of a binary system can be quite low when simple conﬁgurations occur. [sent-126, score-0.244]

62 If we , , consider the ranking approach, one can imagine the following solution: is the hyperplane separating class 2 from class 3, and . [sent-127, score-0.639]

63 By taking the number of labels at point to be where and , we have a simple multi-label system that separates all the regions exactly. [sent-128, score-0.19]

64 £ b 1 1,3 ¥ © £ 1,2 ¥¢ I © £ ¥D s s Q S© £ ¥ S Figure 2: Three labels and three regions in the input space. [sent-131, score-0.119]

65 The bottom right region is partitioned into two sub-regions with labels or . [sent-133, score-0.119]

66 £ © £ To make this point more concrete we sampled points uniformly on and solved all optimization problems with . [sent-134, score-0.077]

67 On the learning set the Hamming Loss for the binary approach was although for the direct approach it was as expected. [sent-135, score-0.233]

68 s ¥ £ F ¤ ¥¥ ¥ 6 Experiments on real data Yeast Saccharomyces cerevisiae Metabolism Transcription Energy Protein Synthesis Protein Destination Cell Growth, Cell Division, DNA synthesis Ionic Homeostasis Cell. [sent-136, score-0.083]

69 communication, Signal Transduction Cellular Biogenesis Transposable elements Viral and Plasmid proteins Transport Facilitation YAL041W Figure 3: First level of the hierarchy of the gene functional classes. [sent-139, score-0.237]

70 One gene, for instance the gene YAL041W can belong to different groups (shaded in grey on the ﬁgure). [sent-141, score-0.197]

71 The Yeast dataset is formed by micro-array expression data and phylogenetic proﬁles with 1500 genes in the learning set and 917 in the test set. [sent-142, score-0.203]

72 Each gene is associated with a set of functional classes whose maximum size can be potentially more than . [sent-144, score-0.288]

73 The whole set of classes is indeed structured in a tree whose leaves are the functional categories (see http://mips. [sent-147, score-0.157]

74 Given a gene, knowing which edge to take from one level to another leads directly to a leaf and ¥¦ §§£ ¦£ ¥ thus to a functional class. [sent-150, score-0.101]

75 Since one gene can have many functional classes this is a multi-label problem: one gene is associated to different edges. [sent-152, score-0.402]

76 We then have and the average number of labels for all genes in the learning set is . [sent-153, score-0.226]

77 First as a ranking system with the Ranking Loss and the precision. [sent-155, score-0.71]

78 In that case, for the binary approach, the real outputs of the two-class SVMs were used as ranking values. [sent-156, score-0.819]

79 We computed the latter for the binary approach used in conjunction with SVMs, for the Rank-SVM and for Boostexter. [sent-158, score-0.244]

80 To measure the Hamming Loss with Boostexter we used a threshold based function in combination with the ranking given by the algorithm. [sent-159, score-0.681]

81 Loss functions for the rank-SVM and the binary approach based on two-class SVMs. [sent-164, score-0.167]

82 £ ¥ ¥ For rank-SVMs and for two-class SVMs in the binary approach we choose polynomial kernels of degrees two to nine (experiments on two-class problems using the Yeast data in [6] already showed that polynomial kernels were appropriate for this task). [sent-167, score-0.313]

83 ¦ 00$ # 1 ¦ "0# 43¥ 4 2¦ ¥ ¨ 2¦ 553¥ 6 ¦ ¨ ¥ degree Precision Ranking Loss Hamming Loss one-error Figure 5: Polynomials of degree 6-9. [sent-176, score-0.074]

84 Loss functions for the rank-SVM and the binary approach based on two-class SVMs. [sent-177, score-0.167]

85 Note that these results are worse than with the binary approach or with rank-SVM. [sent-182, score-0.167]

86 Note that Boostexter performs quite poorly on this dataset compared to SVM-based approaches. [sent-183, score-0.072]

87 One of the main advantages of the SVM-based approaches is the ability to incorporate priori knowledge into the kernel and control complexity via the kernel and regularization. [sent-185, score-0.105]

88 To compare the binary and the rank-SVM we put in bold the best results for each kernel. [sent-187, score-0.233]

89 For all kernels and for almost all losses, the combination ranking based SVM approach is better than the binary one. [sent-188, score-0.857]

90 It is consistent with the fact that this system tends to minimize this particular loss function. [sent-190, score-0.303]

91 It is worth noticing that when the kernel becomes more and more complex the difference between rank-SVM and the binary method disappears. [sent-191, score-0.168]

92 7 Discussion and conclusion In this paper we have deﬁned a whole system to deal with multi-label problems. [sent-192, score-0.098]

93 The main contribution is the deﬁnition of a ranking based SVM that extends the use of the latter to many problems in the area of Bioinformatics and Text Mining. [sent-193, score-0.73]

94 We have seen on complex, real data that rank-SVMs lead to better performance than Boostexter and the binary approach. [sent-194, score-0.18]

95 However, we can also extend the rank-SVM system to perform feature selection on ranking problems [3] . [sent-196, score-0.754]

96 This application can be very useful in the ﬁeld of bioinformatics as one is often interested in interpretability of a multilabel decision rule. [sent-197, score-0.192]

97 For example one could be interested in a small set of genes which is discriminative in a multi-condition physical disorder. [sent-198, score-0.105]

98 Text categorization with support vector machines: learning with many relevant features. [sent-221, score-0.072]

99 Multi-label text classiﬁcation with a mixture model trained by em. [sent-226, score-0.074]

100 Combining microarray expression data and phylogenetic proﬁles to learn functional categories using support vector machines. [sent-235, score-0.204]

similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('ranking', 0.639), ('boostexter', 0.313), ('loss', 0.232), ('hamming', 0.186), ('gene', 0.165), ('wq', 0.149), ('binary', 0.134), ('labels', 0.119), ('yeast', 0.109), ('precision', 0.089), ('label', 0.087), ('classi', 0.087), ('transport', 0.081), ('weston', 0.081), ('genes', 0.074), ('wy', 0.074), ('text', 0.074), ('functional', 0.072), ('system', 0.071), ('margin', 0.069), ('bold', 0.069), ('bioinformatics', 0.069), ('cu', 0.069), ('eq', 0.069), ('wc', 0.069), ('predictor', 0.068), ('multilabel', 0.063), ('phylogenetic', 0.063), ('protein', 0.062), ('ranked', 0.057), ('svm', 0.055), ('andr', 0.054), ('jason', 0.054), ('svms', 0.053), ('cation', 0.052), ('kernels', 0.051), ('size', 0.051), ('de', 0.051), ('concern', 0.05), ('food', 0.05), ('latter', 0.047), ('real', 0.046), ('problems', 0.044), ('aware', 0.043), ('cellular', 0.043), ('elisseeff', 0.043), ('threshold', 0.042), ('cell', 0.042), ('losses', 0.041), ('polynomials', 0.041), ('biowulf', 0.039), ('quite', 0.039), ('support', 0.039), ('calculations', 0.038), ('complexity', 0.037), ('degree', 0.037), ('synthesis', 0.037), ('iff', 0.035), ('technologies', 0.035), ('kernel', 0.034), ('les', 0.034), ('ab', 0.034), ('nition', 0.034), ('learning', 0.033), ('approach', 0.033), ('regression', 0.033), ('dataset', 0.033), ('optimization', 0.033), ('toy', 0.032), ('follow', 0.032), ('sign', 0.032), ('instance', 0.032), ('schapire', 0.031), ('mining', 0.031), ('interested', 0.031), ('gure', 0.031), ('categories', 0.03), ('put', 0.03), ('document', 0.03), ('pro', 0.03), ('subject', 0.03), ('cantly', 0.03), ('cost', 0.03), ('conjunction', 0.03), ('reasoning', 0.03), ('take', 0.029), ('done', 0.029), ('quadratic', 0.029), ('decision', 0.029), ('dual', 0.028), ('problem', 0.028), ('indeed', 0.028), ('controlling', 0.028), ('whole', 0.027), ('ionic', 0.027), ('broadway', 0.027), ('cai', 0.027), ('chemnitz', 0.027), ('death', 0.027), ('differentiates', 0.027)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 1.0000005 22 nips-2001-A kernel method for multi-labelled classification

Author: André Elisseeff, Jason Weston

2 0.31625378 147 nips-2001-Pranking with Ranking

Author: Koby Crammer, Yoram Singer

Abstract: We discuss the problem of ranking instances. In our framework each instance is associated with a rank or a rating, which is an integer from 1 to k. Our goal is to find a rank-prediction rule that assigns each instance a rank which is as close as possible to the instance's true rank. We describe a simple and efficient online algorithm, analyze its performance in the mistake bound model, and prove its correctness. We describe two sets of experiments, with synthetic data and with the EachMovie dataset for collaborative filtering. In the experiments we performed, our algorithm outperforms online algorithms for regression and classification applied to ranking. 1

3 0.15652265 139 nips-2001-Online Learning with Kernels

Author: Jyrki Kivinen, Alex J. Smola, Robert C. Williamson

Abstract: We consider online learning in a Reproducing Kernel Hilbert Space. Our method is computationally efﬁcient and leads to simple algorithms. In particular we derive update equations for classiﬁcation, regression, and novelty detection. The inclusion of the -trick allows us to give a robust parameterization. Moreover, unlike in batch learning where the -trick only applies to the -insensitive loss function we are able to derive general trimmed-mean types of estimators such as for Huber’s robust loss. ¡

4 0.13492323 81 nips-2001-Generalization Performance of Some Learning Problems in Hilbert Functional Spaces

Author: T. Zhang

Abstract: We investigate the generalization performance of some learning problems in Hilbert functional Spaces. We introduce a notion of convergence of the estimated functional predictor to the best underlying predictor, and obtain an estimate on the rate of the convergence. This estimate allows us to derive generalization bounds on some learning formulations.

5 0.12660585 8 nips-2001-A General Greedy Approximation Algorithm with Applications

Author: T. Zhang

Abstract: Greedy approximation algorithms have been frequently used to obtain sparse solutions to learning problems. In this paper, we present a general greedy algorithm for solving a class of convex optimization problems. We derive a bound on the rate of approximation for this algorithm, and show that our algorithm includes a number of earlier studies as special cases.

6 0.11087739 56 nips-2001-Convolution Kernels for Natural Language

7 0.10708399 190 nips-2001-Thin Junction Trees

8 0.097780116 159 nips-2001-Reducing multiclass to binary by coupling probability estimates

9 0.097221881 137 nips-2001-On the Convergence of Leveraging

10 0.094039626 38 nips-2001-Asymptotic Universality for Learning Curves of Support Vector Machines

11 0.090714492 105 nips-2001-Kernel Machines and Boolean Functions

12 0.088488109 46 nips-2001-Categorization by Learning and Combining Object Parts

13 0.087476373 9 nips-2001-A Generalization of Principal Components Analysis to the Exponential Family

14 0.082188018 149 nips-2001-Probabilistic Abstraction Hierarchies

15 0.079295367 138 nips-2001-On the Generalization Ability of On-Line Learning Algorithms

16 0.078008369 77 nips-2001-Fast and Robust Classification using Asymmetric AdaBoost and a Detector Cascade

17 0.077705048 29 nips-2001-Adaptive Sparseness Using Jeffreys Prior

18 0.076555856 170 nips-2001-Spectral Kernel Methods for Clustering

19 0.076428652 144 nips-2001-Partially labeled classification with Markov random walks

20 0.076313391 50 nips-2001-Classifying Single Trial EEG: Towards Brain Computer Interfacing

similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, -0.235), (1, 0.131), (2, -0.025), (3, 0.172), (4, 0.062), (5, -0.044), (6, -0.021), (7, 0.047), (8, -0.101), (9, 0.001), (10, -0.017), (11, -0.047), (12, -0.048), (13, 0.076), (14, 0.06), (15, 0.032), (16, 0.05), (17, 0.232), (18, -0.005), (19, 0.073), (20, 0.097), (21, 0.123), (22, 0.174), (23, 0.029), (24, 0.034), (25, -0.167), (26, 0.06), (27, -0.05), (28, 0.095), (29, 0.092), (30, 0.035), (31, -0.143), (32, -0.087), (33, 0.008), (34, 0.114), (35, -0.145), (36, -0.225), (37, -0.091), (38, -0.163), (39, 0.27), (40, 0.033), (41, 0.032), (42, 0.172), (43, 0.156), (44, 0.078), (45, 0.003), (46, -0.16), (47, -0.008), (48, 0.054), (49, -0.029)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.9478063 22 nips-2001-A kernel method for multi-labelled classification

Author: André Elisseeff, Jason Weston

2 0.81429535 147 nips-2001-Pranking with Ranking

Author: Koby Crammer, Yoram Singer

3 0.45245853 139 nips-2001-Online Learning with Kernels

Author: Jyrki Kivinen, Alex J. Smola, Robert C. Williamson

4 0.3771036 159 nips-2001-Reducing multiclass to binary by coupling probability estimates

Author: B. Zadrozny

Abstract: This paper presents a method for obtaining class membership probability estimates for multiclass classiﬁcation problems by coupling the probability estimates produced by binary classiﬁers. This is an extension for arbitrary code matrices of a method due to Hastie and Tibshirani for pairwise coupling of probability estimates. Experimental results with Boosted Naive Bayes show that our method produces calibrated class membership probability estimates, while having similar classiﬁcation accuracy as loss-based decoding, a method for obtaining the most likely class that does not generate probability estimates.

5 0.35837829 149 nips-2001-Probabilistic Abstraction Hierarchies

Author: Eran Segal, Daphne Koller, Dirk Ormoneit

Abstract: Many domains are naturally organized in an abstraction hierarchy or taxonomy, where the instances in “nearby” classes in the taxonomy are similar. In this paper, we provide a general probabilistic framework for clustering data into a set of classes organized as a taxonomy, where each class is associated with a probabilistic model from which the data was generated. The clustering algorithm simultaneously optimizes three things: the assignment of data instances to clusters, the models associated with the clusters, and the structure of the abstraction hierarchy. A unique feature of our approach is that it utilizes global optimization algorithms for both of the last two steps, reducing the sensitivity to noise and the propensity to local maxima that are characteristic of algorithms such as hierarchical agglomerative clustering that only take local steps. We provide a theoretical analysis for our algorithm, showing that it converges to a local maximum of the joint likelihood of model and data. We present experimental results on synthetic data, and on real data in the domains of gene expression and text.

6 0.33846539 182 nips-2001-The Fidelity of Local Ordinal Encoding

7 0.33416194 29 nips-2001-Adaptive Sparseness Using Jeffreys Prior

8 0.32168314 190 nips-2001-Thin Junction Trees

9 0.31516144 101 nips-2001-K-Local Hyperplane and Convex Distance Nearest Neighbor Algorithms

10 0.30670181 104 nips-2001-Kernel Logistic Regression and the Import Vector Machine

11 0.30579299 9 nips-2001-A Generalization of Principal Components Analysis to the Exponential Family

12 0.3053081 137 nips-2001-On the Convergence of Leveraging

13 0.30467743 8 nips-2001-A General Greedy Approximation Algorithm with Applications

14 0.2963241 81 nips-2001-Generalization Performance of Some Learning Problems in Hilbert Functional Spaces

15 0.28786704 67 nips-2001-Efficient Resources Allocation for Markov Decision Processes

16 0.27591577 138 nips-2001-On the Generalization Ability of On-Line Learning Algorithms

17 0.27565163 69 nips-2001-Escaping the Convex Hull with Extrapolated Vector Machines

18 0.2749086 33 nips-2001-An Efficient Clustering Algorithm Using Stochastic Association Model and Its Implementation Using Nanostructures

19 0.27419785 56 nips-2001-Convolution Kernels for Natural Language

20 0.27081102 64 nips-2001-EM-DD: An Improved Multiple-Instance Learning Technique

similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(14, 0.078), (17, 0.038), (19, 0.025), (27, 0.096), (30, 0.107), (38, 0.015), (58, 0.24), (59, 0.027), (72, 0.087), (79, 0.029), (83, 0.028), (91, 0.139)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.84683913 22 nips-2001-A kernel method for multi-labelled classification

Author: André Elisseeff, Jason Weston

2 0.8420223 54 nips-2001-Contextual Modulation of Target Saliency

Author: Antonio Torralba

Abstract: The most popular algorithms for object detection require the use of exhaustive spatial and scale search procedures. In such approaches, an object is defined by means of local features. fu this paper we show that including contextual information in object detection procedures provides an efficient way of cutting down the need for exhaustive search. We present results with real images showing that the proposed scheme is able to accurately predict likely object classes, locations and sizes. 1

3 0.8100822 160 nips-2001-Reinforcement Learning and Time Perception -- a Model of Animal Experiments

Author: Jonathan L. Shapiro, J. Wearden

Abstract: Animal data on delayed-reward conditioning experiments shows a striking property - the data for different time intervals collapses into a single curve when the data is scaled by the time interval. This is called the scalar property of interval timing. Here a simple model of a neural clock is presented and shown to give rise to the scalar property. The model is an accumulator consisting of noisy, linear spiking neurons. It is analytically tractable and contains only three parameters. When coupled with reinforcement learning it simulates peak procedure experiments, producing both the scalar property and the pattern of single trial covariances. 1

4 0.67901653 102 nips-2001-KLD-Sampling: Adaptive Particle Filters

Author: Dieter Fox

Abstract: Over the last years, particle ﬁlters have been applied with great success to a variety of state estimation problems. We present a statistical approach to increasing the efﬁciency of particle ﬁlters by adapting the size of sample sets on-the-ﬂy. The key idea of the KLD-sampling method is to bound the approximation error introduced by the sample-based representation of the particle ﬁlter. The name KLD-sampling is due to the fact that we measure the approximation error by the Kullback-Leibler distance. Our adaptation approach chooses a small number of samples if the density is focused on a small part of the state space, and it chooses a large number of samples if the state uncertainty is high. Both the implementation and computation overhead of this approach are small. Extensive experiments using mobile robot localization as a test application show that our approach yields drastic improvements over particle ﬁlters with ﬁxed sample set sizes and over a previously introduced adaptation technique.

5 0.6760785 63 nips-2001-Dynamic Time-Alignment Kernel in Support Vector Machine

Author: Hiroshi Shimodaira, Ken-ichi Noma, Mitsuru Nakai, Shigeki Sagayama

Abstract: A new class of Support Vector Machine (SVM) that is applicable to sequential-pattern recognition such as speech recognition is developed by incorporating an idea of non-linear time alignment into the kernel function. Since the time-alignment operation of sequential pattern is embedded in the new kernel function, standard SVM training and classiﬁcation algorithms can be employed without further modiﬁcations. The proposed SVM (DTAK-SVM) is evaluated in speaker-dependent speech recognition experiments of hand-segmented phoneme recognition. Preliminary experimental results show comparable recognition performance with hidden Markov models (HMMs). 1

6 0.67226934 56 nips-2001-Convolution Kernels for Natural Language

7 0.67141283 13 nips-2001-A Natural Policy Gradient

8 0.6657117 149 nips-2001-Probabilistic Abstraction Hierarchies

9 0.66404468 185 nips-2001-The Method of Quantum Clustering

10 0.66391802 60 nips-2001-Discriminative Direction for Kernel Classifiers

11 0.66298205 77 nips-2001-Fast and Robust Classification using Asymmetric AdaBoost and a Detector Cascade

12 0.66262364 195 nips-2001-Variance Reduction Techniques for Gradient Estimates in Reinforcement Learning

13 0.66206026 100 nips-2001-Iterative Double Clustering for Unsupervised and Semi-Supervised Learning

14 0.6618554 46 nips-2001-Categorization by Learning and Combining Object Parts

15 0.66157377 157 nips-2001-Rates of Convergence of Performance Gradient Estimates Using Function Approximation and Bias in Reinforcement Learning

16 0.66155088 131 nips-2001-Neural Implementation of Bayesian Inference in Population Codes

17 0.66039246 161 nips-2001-Reinforcement Learning with Long Short-Term Memory

18 0.66016418 121 nips-2001-Model-Free Least-Squares Policy Iteration

19 0.66001487 29 nips-2001-Adaptive Sparseness Using Jeffreys Prior

20 0.65772611 92 nips-2001-Incorporating Invariances in Non-Linear Support Vector Machines