nips nips2002 nips2002-68 knowledge-graph by maker-knowledge-mining

68 nips-2002-Discriminative Densities from Maximum Contrast Estimation

Source: pdf

Author: Peter Meinicke, Thorsten Twellmann, Helge Ritter

Abstract: We propose a framework for classiﬁer design based on discriminative densities for representation of the differences of the class-conditional distributions in a way that is optimal for classiﬁcation. The densities are selected from a parametrized set by constrained maximization of some objective function which measures the average (bounded) difference, i.e. the contrast between discriminative densities. We show that maximization of the contrast is equivalent to minimization of an approximation of the Bayes risk. Therefore using suitable classes of probability density functions, the resulting maximum contrast classiﬁers (MCCs) can approximate the Bayes rule for the general multiclass case. In particular for a certain parametrization of the density functions we obtain MCCs which have the same functional form as the well-known Support Vector Machines (SVMs). We show that MCC-training in general requires some nonlinear optimization but under certain conditions the problem is concave and can be tackled by a single linear program. We indicate the close relation between SVM- and MCC-training and in particular we show that Linear Programming Machines can be viewed as an approximate realization of MCCs. In the experiments on benchmark data sets, the MCC shows a competitive classiﬁcation performance.

Reference: text

Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 de Abstract We propose a framework for classiﬁer design based on discriminative densities for representation of the differences of the class-conditional distributions in a way that is optimal for classiﬁcation. [sent-7, score-0.63]

2 The densities are selected from a parametrized set by constrained maximization of some objective function which measures the average (bounded) difference, i. [sent-8, score-0.548]

3 We show that maximization of the contrast is equivalent to minimization of an approximation of the Bayes risk. [sent-11, score-0.304]

4 Therefore using suitable classes of probability density functions, the resulting maximum contrast classiﬁers (MCCs) can approximate the Bayes rule for the general multiclass case. [sent-12, score-0.427]

5 In particular for a certain parametrization of the density functions we obtain MCCs which have the same functional form as the well-known Support Vector Machines (SVMs). [sent-13, score-0.316]

6 We show that MCC-training in general requires some nonlinear optimization but under certain conditions the problem is concave and can be tackled by a single linear program. [sent-14, score-0.099]

7 With being the class-conditional probability density functions (PDFs) and denoting the corresponding apriori probabilities of ) & $ %#! [sent-18, score-0.19]

8 ¥(2 £ 403¤¡ 1 $ 6 75 ( class-membership we have the risk ! [sent-20, score-0.081]

9 [3]) that the expected risk is minimized, if one chooses the classiﬁer (2) ¥(2 £ 403¤¡ ' 6) 4 0) ¥ ¡ 7¦5321¤¡ ¦£ ¢ 61 5 6 The resulting lower bound on is known as the Bayes risk which limits the average performance of the classiﬁer . [sent-23, score-0.162]

10 Because the class-conditional densities are usually unknown, one way to realize the above classiﬁer is to use estimates of these densities instead. [sent-24, score-0.636]

11 This leads to the so-called plug-in classiﬁers, which are Bayes-consistent if the density estimators are consistent (e. [sent-25, score-0.149]

12 ¥ £¡ ¦¤¢ We recently proposed a method for the design of density-based classiﬁers without resorting to the usual density estimation schemes of the plug-in approach [6]. [sent-29, score-0.216]

13 Instead we utilized discriminative densities with parameters optimized to solve the classiﬁcation problem. [sent-30, score-0.511]

14 The approach requires maximization of the average bounded difference between class (discriminative) densities , which we refer to as the contrast of the underlying “true” dis-bounded contrast is the expectation with tributions. [sent-31, score-0.791]

15 In this paper we show that with some slight modiﬁcation the contrast can be viewed as an approximation of the negative Bayes risk (up to some constant shift and scaling) which is valid for the binary as well as for the general multiclass case. [sent-34, score-0.372]

16 Therefore for certain parametrizations of the discriminative densities MCCs allow to ﬁnd an optimal trade-off between the classical plug-in Bayes-consistency and the consistency which arises from direct minimization of the approximate Bayes risk. [sent-35, score-0.61]

17 Furthermore, for a particular parametrization of the PDFs, we obtain certain kinds of Linear Programming Machines (LPMs) [4] as (in general) approximate solutions of maximum contrast estimation. [sent-36, score-0.329]

18 In that way MCCs provide a Bayes-consistent approach to realize multiclass LPMs / SVMs and they suggest an interpretation of the magnitude of the LPM / SVM classiﬁcation function in terms of density differences which provide a probabilistic measure of conﬁdence. [sent-37, score-0.317]

19 For the case of LPMs we propose an extended optimization procedure for maximization of the contrast via iteration of linear optimizations. [sent-38, score-0.337]

20 Inspired by the MCC-framework, for the resulting Sequential Linear Programming Machines (SLPM) we propose a new regularizer which allows to ﬁnd an optimal trade-off between the above mentioned two approaches to Bayes consistency. [sent-39, score-0.084]

21 2 Maximum Contrast Estimation For the design of MCCs the ﬁrst step, which is the same as for the plug-in concept, requires to replace the unknown class-conditional densities of the Bayes classiﬁer (2) by suitably parametrized PDFs. [sent-41, score-0.385]

22 Then, instead of choosing the parameters for an approximation of the original (true) densities (e. [sent-42, score-0.296]

23 by maximum likelihood estimation) as with the plug-in scheme, the density parameters are choosen to maximize the so-called contrast which is the expected value of the -bounded density differences as deﬁned in (3). [sent-44, score-0.551]

24 Obviously, these are not the best discriminative densities we may think of and therefore we require an appropriate bound . [sent-52, score-0.511]

25 For ﬁnite , maximization of the contrast enforces a redistribution of the estimated probability mass and gives rise to a constrained linear optimization problem in the space of discriminative densities which may be solved by variational methods in some cases. [sent-53, score-0.927]

26 ¢£ The relation between contrast and Bayes risk becomes more convenient when we slightly modify the above deﬁnition (3) by a unit upper bound and by adding a lower bound on the -scaled density differences: S TRQ & © & HY¤£¡ 1 ! [sent-54, score-0.392]

27 Therefore, for an inﬁnite scale factor the (expected) approaches the negative Bayes risk up to constant shift and ¥ H§ £ ¡ F G 12 G £ 2§ §) & 2"¥ H§ ¤¡ F 730§ £ ¡ ( ' & F %4# XU"! [sent-56, score-0.201]

28 ¡ ¥ G ¥ ¡ with scale factor contrast scaling: (4) (5) Thus the scale factor deﬁnes a subset of the input-space, which includes the decision boundary and which becomes increasingly focused in their vicinity as . [sent-58, score-0.379]

29 The extent of the region is deﬁned by the bounds on the difference between discriminative densities. [sent-59, score-0.244]

30 In terms of the contrast function it can be deﬁned as (6) Since for MCC-training we maximize the empirical contrast, i. [sent-60, score-0.194]

31 the corresponding sample average of , the scale factor then deﬁnes a subset of the training data which has impact on learning of the decision boundary. [sent-62, score-0.16]

32 Thus for increasing scale factor the relative size of that subset is shrinking. [sent-63, score-0.144]

33 However for increasing size of the training set the scale factor can be gradually increased and then, for suitable classes of PDFs, MCCs can approach the Bayes rule. [sent-64, score-0.265]

34 In other words, acts as a regularization parameter such that, for particular choices of the PDF class convergence to the Bayes classiﬁer can be achieved if the quality of the approximation of the loss function is gradually increased for increasing sample sizes. [sent-65, score-0.146]

35 In the following section we shall consider such a class of PDFs which is ﬂexible enough and which turns out to include a certain kind of SVMs. [sent-66, score-0.081]

36 3 MCC-Realizations In the following we shall ﬁrst consider a particularly useful parametrization of the discriminative densities which gives rise to classiﬁers which in the binary case have the same functional form as SVMs up to a “missing” bias term in the MCC-case. [sent-67, score-0.762]

37 For training of these MCCs we derive a suitable objective function which can be maximized by sequential linear programming where we show the close relation to training of Linear Programming Machines. [sent-68, score-0.294]

38 On the other hand if we allow for local variation of the bandwidth we get a complicated contrast which is difﬁcult to maximize due to nonlinear dependencies on the parameters. [sent-72, score-0.278]

39 The same is true if we treat the kernel centers as free parameters. [sent-73, score-0.094]

40 Thus we have class-speciﬁc densities with mixing weights the contribution of a single training example to the PDF. [sent-75, score-0.497]

41 with under certain which control With that choice we achieve plug-in Bayes-consistency for the case of equal mixing weights, since then we have the usual kernel density estimator (KDE), which, besides some mild assumptions about the distributions, requires a vanishing kernel bandwidth for . [sent-77, score-0.641]

42 ( &$ ' % # 4 # $ so that we can write the empirical contrast examples, as: and the For notational simplicity in the following we shall incorporate the scale factor mixing weigths into a common parameter vector with and . [sent-85, score-0.45]

43 Further we deﬁne the scaled density difference training (10) $ & " ¤ 6 D& 9 @8 6 A 1 £ ¡ ¥ ¤¢ 9 " ¡¥ # ¡ F 5 7 where the assignment variables realize the maximum function in (4). [sent-86, score-0.323]

44 With ﬁxed assignment variables , is concave and maximization with respect to gives rise # ) 6 F G F 5 7 9 6 to a linear optimization problem. [sent-87, score-0.281]

45 On the other hand, for ﬁxed maximization with respect to the is achieved by setting for negative terms. [sent-88, score-0.142]

46 This suggests a sequential linear optimization strategy for overall maximization of the contrast which shall be introduced in detail in the following section. [sent-89, score-0.441]

47 ¡6 # ) 9 9 6 Since we have already incorporated as a scaling factor into the parameter vector , is now identiﬁed with the norm . [sent-90, score-0.071]

48 Therefore the scale factor can be adjusted implicitly . [sent-91, score-0.09]

49 Thus a suitable by a regularization term which penalizes some suitable norm of the objective function can be deﬁned by # & (11) # ) "¥ ¢ ¥ ¡ £¡ ¡ ' # # ¡ & # $ F ¡ ¥ ¡ 5 7 # F ¡ 5 7 ¡ with determining the weight of the penalty, i. [sent-92, score-0.209]

50 We now consider several instances of the case where the penalty corresponds to some -norm of . [sent-95, score-0.046]

51 With the -norm, for the probability mass of the discriminative densities is concentrated on those two kernel-functions which yield the highest average density difference. [sent-96, score-0.753]

52 Although that property forces the sparsest solution for large enough , clearly, that solution isn’t Bayes-consistent in general because as pointed out in Sec. [sent-97, score-0.127]

53 2, for all probability mass of the discriminative densities is concentrated at the two points with maximum average density difference. [sent-98, score-0.794]

54 1 # ¡ ¡ ¢¢ ¡ ¡ ¥ ¡ ¢ Conversely taking , which resembles the standard SVM regularizer [10], . [sent-99, score-0.055]

55 Indeed, it is easy to see that all yields the KDE with equal mixing weights for -norm penalties with share this convenient property, which guarantees “plug-in” Bayes consistency in the case where the solution is totally determined by the regularizer. [sent-100, score-0.242]

56 In that case kernel density estimators are achieved as the “default” solution. [sent-101, score-0.243]

57 For that we achieve an equal distribution of the weights kind of penalty in the limiting case which corresponds to the kernel density estimator (KDE) solution. [sent-103, score-0.381]

58 By a suitable choice of the kernel width and the scale of the weights, e. [sent-105, score-0.193]

59 Dividing the objective by , subtracting , setting and turning minimization to maximization of the negative objective shows that LPM training corresponds to a special case of MCC training with ﬁxed and -norm regularizer with . [sent-109, score-0.367]

60 3 Sequential Linear Programming Estimation of mixing weights is now achieved by maximizing the sample contrast with respect to the and the assignment variables . [sent-112, score-0.369]

61 If convergence in contrast then stop else proceed with step 2. [sent-125, score-0.162]

62 # 6 ¥ £¡ & 4 (&9 6 Where are slack variables, measuring the part of the density difference which can be charged to the objective function. [sent-126, score-0.269]

63 Since we used unnormalized Gaussian kernel functions with , i. [sent-128, score-0.094]

64 we excluded all multiplicative density constants, that constraint doesn’t exclude any useful solutions for the weights. [sent-130, score-0.149]

65 % % & ¡ # $ # ¡ ¡ ¥ ¦£ £ ¡ ¦ 4 Experiments In the following section we consider the task of solving binary classiﬁcation problems within the MCC-framework, using the above SLPM with Gaussian kernel function. [sent-131, score-0.128]

66 The ﬁrst experiment illustrates the behaviour of the MCC for different values for the regularization by means of a simple two-dimensional toy dataset. [sent-132, score-0.094]

67 The second experiment compares the classiﬁcation performance of the MCC with those of the SVM and KernelDensity-Classiﬁer (KDC) which is a special case of the MCC with equal weighting of each kernel function. [sent-133, score-0.094]

68 ¡ The two-dimensional toy dataset consists of 300 data points, sampled from two overlapping isotropic normal distributions with a mutual distance of and standard deviation . [sent-135, score-0.119]

69 Figure 1 shows the solution of the MCC for two different values of (only data points with non-zero weights according the criterion are marked by symbols). [sent-136, score-0.131]

70 In both ﬁgures, data points with large mixing weights are located near the decision border. [sent-137, score-0.246]

71 In particular for small there are regions of high contrast alongside the decision function (illustrated by isolines). [sent-138, score-0.199]

72 For increasing the number of data points with non-zero increases. [sent-139, score-0.095]

73 This illustrates that for increasing the quality of the approximation of the loss function decreases. [sent-143, score-0.054]

74 In both ﬁgures, several data points are misclassiﬁed with a contrast . [sent-144, score-0.203]

75 The MCC identiﬁed those data points as outliers and deactivated them during the training (encircled symbols). [sent-145, score-0.185]

76 For this experiment we selected the Pima Indian Diabetes, Breast-Cancer, Heart and Thyroid dataset from the UCI Machine Learning repository. [sent-148, score-0.08]

77 5 ¡ 5 Figure 1: Two MCC solutions for the two-dimensional toy dataset for different values of (left: , right: ). [sent-167, score-0.119]

78 The symbols and depict the positions of data points with with non-zero . [sent-168, score-0.104]

79 Encircled symbols have been deactivated during the training (symbols for deactivated data points are not scaled according to , since in most cases is zero). [sent-170, score-0.388]

80 The absolute value of the contrast is illustrated by the isolines while the sign of the contrast depicts the binary classiﬁcation of the classiﬁer. [sent-171, score-0.413]

81 The region with which corresponds to as deﬁned in (6) is colored white and the complement colored gray. [sent-172, score-0.088]

82 The percentage of data points that deﬁne the solution is (left ﬁgure) and (right ﬁgure) of the dataset. [sent-173, score-0.149]

83 Since we used for all classiﬁers the Gaussian kernel function, all three algorithms are parametrized by the bandwidth . [sent-180, score-0.236]

84 Additionally, for the SVM and MCC the regularization value had to be chosen. [sent-181, score-0.055]

85 The optimal parametrization was chosen by estimating the generalization performance for different values of bandwidth and regularization by means of the average test error on the ﬁrst ﬁve dataset partitions. [sent-182, score-0.383]

86 More precisely, a ﬁrst coarse scan was performed, followed by a ﬁne scan in the interval near the optimal values of the ﬁrst one. [sent-183, score-0.176]

87 Each scan considered 1600 different combinations of and , resp. [sent-184, score-0.058]

88 For parameter pairs with identical test error, the pair constructing the sparsest solution was kept. [sent-186, score-0.091]

89 ¦ ¦ ¨ ) ) © ¡ ¥ ¡ ¥ ¡ ¡ Table 1 shows the optimal parametrization of the MCC in combination with the classiﬁcation rate and sparseness of the solution (measured as percentage non-zero ). [sent-190, score-0.272]

90 The last two columns show the absolute number of iterations and the ﬁnal number of deactivated examples. [sent-192, score-0.111]

91 In particular for the Heart, Breast-Cancer and Diabetes dataset the solution of the MCC is signiﬁcantly sparser than those of the SVM (see Tab. [sent-194, score-0.116]

92 6 ¨ 5 Conclusion The MCC-approach provides an understanding of SVMs / LPMs in terms of generative modelling using discriminative densities. [sent-198, score-0.215]

93 While usual unsupervised density estimation schemes try to minimize some distance criterion (e. [sent-199, score-0.185]

94 Kullback-Leibler divergence) be- ¥ ¡ ¡ Table 1: Optimal parametrization , classiﬁcation rate, percentage of non-zero , number of iterations of the MCC and number of . [sent-201, score-0.207]

95 The results are averaged over all 100 dataset partitions. [sent-202, score-0.08]

96 For the classiﬁcation rate and percentage of non-zero -coefﬁcients the corresponding value after the ﬁrst MCC iteration is given in brackets. [sent-203, score-0.072]

97 Given are the classiﬁcation rates with percentage of non-zero (in brackets). [sent-239, score-0.072]

98 3 ) tween the models and the true densities, MC-estimation aims at learning of densities which represent the differences of the underlying distributions in an optimal way for classiﬁcation. [sent-262, score-0.384]

99 Future work will address the investigation of the general multiclass performance and the capability to cope with misslabeled data. [sent-263, score-0.065]

100 Fast training of support vector machines using sequential minimal optimization. [sent-315, score-0.142]

similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('mcc', 0.526), ('densities', 0.296), ('discriminative', 0.215), ('mccs', 0.194), ('classi', 0.184), ('contrast', 0.162), ('density', 0.149), ('bielefeld', 0.144), ('maximization', 0.142), ('parametrization', 0.135), ('bayes', 0.121), ('kde', 0.12), ('mixing', 0.114), ('deactivated', 0.111), ('lpms', 0.111), ('hy', 0.096), ('kernel', 0.094), ('svm', 0.091), ('pdfs', 0.088), ('bandwidth', 0.084), ('kdc', 0.083), ('slpm', 0.083), ('diabetes', 0.082), ('risk', 0.081), ('dataset', 0.08), ('thyroid', 0.072), ('percentage', 0.072), ('er', 0.07), ('heart', 0.07), ('programming', 0.07), ('cation', 0.067), ('multiclass', 0.065), ('symbols', 0.063), ('isn', 0.061), ('differences', 0.059), ('parametrized', 0.058), ('neuroinformatics', 0.058), ('scan', 0.058), ('encircled', 0.055), ('isolines', 0.055), ('keerthi', 0.055), ('sparsest', 0.055), ('sequential', 0.055), ('regularizer', 0.055), ('regularization', 0.055), ('increasing', 0.054), ('weights', 0.054), ('machines', 0.054), ('objective', 0.052), ('suitable', 0.051), ('ers', 0.05), ('shall', 0.049), ('scale', 0.048), ('svms', 0.048), ('lpm', 0.048), ('helge', 0.048), ('meinicke', 0.048), ('twellmann', 0.048), ('concentrated', 0.047), ('penalty', 0.046), ('mass', 0.046), ('realize', 0.044), ('colored', 0.044), ('benchmark', 0.043), ('factor', 0.042), ('points', 0.041), ('apriori', 0.041), ('datapoints', 0.041), ('assignment', 0.039), ('toy', 0.039), ('xu', 0.039), ('unbounded', 0.039), ('doesn', 0.039), ('slack', 0.039), ('estimator', 0.038), ('consistency', 0.038), ('decision', 0.037), ('gradually', 0.037), ('solution', 0.036), ('usual', 0.036), ('notational', 0.035), ('tsch', 0.035), ('platt', 0.035), ('binary', 0.034), ('concave', 0.034), ('rise', 0.033), ('germany', 0.033), ('optimization', 0.033), ('training', 0.033), ('maximize', 0.032), ('certain', 0.032), ('coarse', 0.031), ('design', 0.031), ('misclassi', 0.03), ('shift', 0.03), ('delta', 0.03), ('scaled', 0.029), ('difference', 0.029), ('optimal', 0.029), ('incorporated', 0.029)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.99999964 68 nips-2002-Discriminative Densities from Maximum Contrast Estimation

Author: Peter Meinicke, Thorsten Twellmann, Helge Ritter

2 0.1584864 24 nips-2002-Adaptive Scaling for Feature Selection in SVMs

Author: Yves Grandvalet, Stéphane Canu

Abstract: This paper introduces an algorithm for the automatic relevance determination of input variables in kernelized Support Vector Machines. Relevance is measured by scale factors deﬁning the input space metric, and feature selection is performed by assigning zero weights to irrelevant variables. The metric is automatically tuned by the minimization of the standard SVM empirical risk, where scale factors are added to the usual set of parameters deﬁning the classiﬁer. Feature selection is achieved by constraints encouraging the sparsity of scale factors. The resulting algorithm compares favorably to state-of-the-art feature selection procedures and demonstrates its effectiveness on a demanding facial expression recognition problem.

3 0.15370266 59 nips-2002-Constraint Classification for Multiclass Classification and Ranking

Author: Sariel Har-Peled, Dan Roth, Dav Zimak

Abstract: The constraint classiﬁcation framework captures many ﬂavors of multiclass classiﬁcation including winner-take-all multiclass classiﬁcation, multilabel classiﬁcation and ranking. We present a meta-algorithm for learning in this framework that learns via a single linear classiﬁer in high dimension. We discuss distribution independent as well as margin-based generalization bounds and present empirical and theoretical evidence showing that constraint classiﬁcation beneﬁts over existing methods of multiclass classiﬁcation.

4 0.14948842 88 nips-2002-Feature Selection and Classification on Matrix Data: From Large Margins to Small Covering Numbers

Author: Sepp Hochreiter, Klaus Obermayer

Abstract: We investigate the problem of learning a classiﬁcation task for datasets which are described by matrices. Rows and columns of these matrices correspond to objects, where row and column objects may belong to diﬀerent sets, and the entries in the matrix express the relationships between them. We interpret the matrix elements as being produced by an unknown kernel which operates on object pairs and we show that - under mild assumptions - these kernels correspond to dot products in some (unknown) feature space. Minimizing a bound for the generalization error of a linear classiﬁer which has been obtained using covering numbers we derive an objective function for model selection according to the principle of structural risk minimization. The new objective function has the advantage that it allows the analysis of matrices which are not positive deﬁnite, and not even symmetric or square. We then consider the case that row objects are interpreted as features. We suggest an additional constraint, which imposes sparseness on the row objects and show, that the method can then be used for feature selection. Finally, we apply this method to data obtained from DNA microarrays, where “column” objects correspond to samples, “row” objects correspond to genes and matrix elements correspond to expression levels. Benchmarks are conducted using standard one-gene classiﬁcation and support vector machines and K-nearest neighbors after standard feature selection. Our new method extracts a sparse set of genes and provides superior classiﬁcation results. 1

5 0.14211817 108 nips-2002-Improving Transfer Rates in Brain Computer Interfacing: A Case Study

Author: Peter Meinicke, Matthias Kaper, Florian Hoppe, Manfred Heumann, Helge Ritter

Abstract: In this paper we present results of a study on brain computer interfacing. We adopted an approach of Farwell & Donchin [4], which we tried to improve in several aspects. The main objective was to improve the transfer rates based on ofﬂine analysis of EEG-data but within a more realistic setup closer to an online realization than in the original studies. The objective was achieved along two different tracks: on the one hand we used state-of-the-art machine learning techniques for signal classiﬁcation and on the other hand we augmented the data space by using more electrodes for the interface. For the classiﬁcation task we utilized SVMs and, as motivated by recent ﬁndings on the learning of discriminative densities, we accumulated the values of the classiﬁcation function in order to combine several classiﬁcations, which ﬁnally lead to signiﬁcantly improved rates as compared with techniques applied in the original work. In combination with the data space augmentation, we achieved competitive transfer rates at an average of 50.5 bits/min and with a maximum of 84.7 bits/min.

6 0.12981148 119 nips-2002-Kernel Dependency Estimation

7 0.11854792 19 nips-2002-Adapting Codes and Embeddings for Polychotomies

8 0.11819243 45 nips-2002-Boosted Dyadic Kernel Discriminants

9 0.11708593 86 nips-2002-Fast Sparse Gaussian Process Methods: The Informative Vector Machine

10 0.11405857 72 nips-2002-Dyadic Classification Trees via Structural Risk Minimization

11 0.10979089 114 nips-2002-Information Regularization with Partially Labeled Data

12 0.1091973 92 nips-2002-FloatBoost Learning for Classification

13 0.10878997 21 nips-2002-Adaptive Classification by Variational Kalman Filtering

14 0.10097966 120 nips-2002-Kernel Design Using Boosting

15 0.0998137 156 nips-2002-On the Complexity of Learning the Kernel Matrix

16 0.099199116 145 nips-2002-Mismatch String Kernels for SVM Protein Classification

17 0.09816356 106 nips-2002-Hyperkernels

18 0.097290777 52 nips-2002-Cluster Kernels for Semi-Supervised Learning

19 0.092736721 62 nips-2002-Coulomb Classifiers: Generalizing Support Vector Machines via an Analogy to Electrostatic Systems

20 0.090322383 138 nips-2002-Manifold Parzen Windows

similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, -0.254), (1, -0.159), (2, 0.097), (3, -0.067), (4, 0.179), (5, -0.031), (6, -0.018), (7, -0.01), (8, 0.032), (9, 0.026), (10, -0.016), (11, 0.036), (12, 0.063), (13, 0.013), (14, 0.136), (15, -0.05), (16, -0.029), (17, 0.042), (18, -0.004), (19, -0.008), (20, 0.008), (21, 0.021), (22, -0.044), (23, -0.044), (24, -0.005), (25, -0.064), (26, -0.078), (27, -0.012), (28, -0.067), (29, -0.004), (30, -0.02), (31, -0.109), (32, 0.013), (33, -0.047), (34, -0.007), (35, 0.075), (36, -0.06), (37, 0.103), (38, -0.003), (39, -0.043), (40, -0.018), (41, 0.008), (42, 0.156), (43, 0.039), (44, 0.126), (45, 0.122), (46, 0.024), (47, -0.01), (48, -0.116), (49, -0.142)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.94800621 68 nips-2002-Discriminative Densities from Maximum Contrast Estimation

Author: Peter Meinicke, Thorsten Twellmann, Helge Ritter

2 0.76770991 108 nips-2002-Improving Transfer Rates in Brain Computer Interfacing: A Case Study

Author: Peter Meinicke, Matthias Kaper, Florian Hoppe, Manfred Heumann, Helge Ritter

3 0.64700872 59 nips-2002-Constraint Classification for Multiclass Classification and Ranking

Author: Sariel Har-Peled, Dan Roth, Dav Zimak

4 0.61104184 62 nips-2002-Coulomb Classifiers: Generalizing Support Vector Machines via an Analogy to Electrostatic Systems

Author: Sepp Hochreiter, Michael C. Mozer, Klaus Obermayer

Abstract: We introduce a family of classiﬁers based on a physical analogy to an electrostatic system of charged conductors. The family, called Coulomb classiﬁers, includes the two best-known support-vector machines (SVMs), the ν–SVM and the C–SVM. In the electrostatics analogy, a training example corresponds to a charged conductor at a given location in space, the classiﬁcation function corresponds to the electrostatic potential function, and the training objective function corresponds to the Coulomb energy. The electrostatic framework provides not only a novel interpretation of existing algorithms and their interrelationships, but it suggests a variety of new methods for SVMs including kernels that bridge the gap between polynomial and radial-basis functions, objective functions that do not require positive-deﬁnite kernels, regularization techniques that allow for the construction of an optimal classiﬁer in Minkowski space. Based on the framework, we propose novel SVMs and perform simulation studies to show that they are comparable or superior to standard SVMs. The experiments include classiﬁcation tasks on data which are represented in terms of their pairwise proximities, where a Coulomb Classiﬁer outperformed standard SVMs. 1

5 0.60453635 45 nips-2002-Boosted Dyadic Kernel Discriminants

Author: Baback Moghaddam, Gregory Shakhnarovich

Abstract: We introduce a novel learning algorithm for binary classiﬁcation with hyperplane discriminants based on pairs of training points from opposite classes (dyadic hypercuts). This algorithm is further extended to nonlinear discriminants using kernel functions satisfying Mercer’s conditions. An ensemble of simple dyadic hypercuts is learned incrementally by means of a conﬁdence-rated version of AdaBoost, which provides a sound strategy for searching through the ﬁnite set of hypercut hypotheses. In experiments with real-world datasets from the UCI repository, the generalization performance of the hypercut classiﬁers was found to be comparable to that of SVMs and k-NN classiﬁers. Furthermore, the computational cost of classiﬁcation (at run time) was found to be similar to, or better than, that of SVM. Similarly to SVMs, boosted dyadic kernel discriminants tend to maximize the margin (via AdaBoost). In contrast to SVMs, however, we oﬀer an on-line and incremental learning machine for building kernel discriminants whose complexity (number of kernel evaluations) can be directly controlled (traded oﬀ for accuracy). 1

6 0.59883869 196 nips-2002-The RA Scanner: Prediction of Rheumatoid Joint Inflammation Based on Laser Imaging

7 0.58949733 24 nips-2002-Adaptive Scaling for Feature Selection in SVMs

8 0.58887136 72 nips-2002-Dyadic Classification Trees via Structural Risk Minimization

9 0.56442434 88 nips-2002-Feature Selection and Classification on Matrix Data: From Large Margins to Small Covering Numbers

10 0.56285346 138 nips-2002-Manifold Parzen Windows

11 0.5585897 114 nips-2002-Information Regularization with Partially Labeled Data

12 0.55636412 86 nips-2002-Fast Sparse Gaussian Process Methods: The Informative Vector Machine

13 0.54735011 55 nips-2002-Combining Features for BCI

14 0.53193128 111 nips-2002-Independent Components Analysis through Product Density Estimation

15 0.52256256 92 nips-2002-FloatBoost Learning for Classification

16 0.52021599 109 nips-2002-Improving a Page Classifier with Anchor Extraction and Link Analysis

17 0.47470462 149 nips-2002-Multiclass Learning by Probabilistic Embeddings

18 0.47397894 67 nips-2002-Discriminative Binaural Sound Localization

19 0.44176254 119 nips-2002-Kernel Dependency Estimation

20 0.43845475 19 nips-2002-Adapting Codes and Embeddings for Polychotomies

similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(11, 0.023), (23, 0.044), (42, 0.093), (54, 0.137), (55, 0.053), (67, 0.012), (68, 0.032), (74, 0.112), (80, 0.22), (87, 0.011), (92, 0.07), (98, 0.123)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.83717322 68 nips-2002-Discriminative Densities from Maximum Contrast Estimation

Author: Peter Meinicke, Thorsten Twellmann, Helge Ritter

2 0.73433852 3 nips-2002-A Convergent Form of Approximate Policy Iteration

Author: Theodore J. Perkins, Doina Precup

Abstract: We study a new, model-free form of approximate policy iteration which uses Sarsa updates with linear state-action value function approximation for policy evaluation, and a “policy improvement operator” to generate a new policy based on the learned state-action values. We prove that if the policy improvement operator produces -soft policies and is Lipschitz continuous in the action values, with a constant that is not too large, then the approximate policy iteration algorithm converges to a unique solution from any initial policy. To our knowledge, this is the ﬁrst convergence result for any form of approximate policy iteration under similar computational-resource assumptions.

3 0.73197347 127 nips-2002-Learning Sparse Topographic Representations with Products of Student-t Distributions

Author: Max Welling, Simon Osindero, Geoffrey E. Hinton

Abstract: We propose a model for natural images in which the probability of an image is proportional to the product of the probabilities of some ﬁlter outputs. We encourage the system to ﬁnd sparse features by using a Studentt distribution to model each ﬁlter output. If the t-distribution is used to model the combined outputs of sets of neurally adjacent ﬁlters, the system learns a topographic map in which the orientation, spatial frequency and location of the ﬁlters change smoothly across the map. Even though maximum likelihood learning is intractable in our model, the product form allows a relatively efﬁcient learning procedure that works well even for highly overcomplete sets of ﬁlters. Once the model has been learned it can be used as a prior to derive the “iterated Wiener ﬁlter” for the purpose of denoising images.

4 0.73143232 204 nips-2002-VIBES: A Variational Inference Engine for Bayesian Networks

Author: Christopher M. Bishop, David Spiegelhalter, John Winn

Abstract: In recent years variational methods have become a popular tool for approximate inference and learning in a wide variety of probabilistic models. For each new application, however, it is currently necessary ﬁrst to derive the variational update equations, and then to implement them in application-speciﬁc code. Each of these steps is both time consuming and error prone. In this paper we describe a general purpose inference engine called VIBES (‘Variational Inference for Bayesian Networks’) which allows a wide variety of probabilistic models to be implemented and solved variationally without recourse to coding. New models are speciﬁed either through a simple script or via a graphical interface analogous to a drawing package. VIBES then automatically generates and solves the variational equations. We illustrate the power and ﬂexibility of VIBES using examples from Bayesian mixture modelling. 1

5 0.73069143 21 nips-2002-Adaptive Classification by Variational Kalman Filtering

Author: Peter Sykacek, Stephen J. Roberts

Abstract: We propose in this paper a probabilistic approach for adaptive inference of generalized nonlinear classiﬁcation that combines the computational advantage of a parametric solution with the ﬂexibility of sequential sampling techniques. We regard the parameters of the classiﬁer as latent states in a ﬁrst order Markov process and propose an algorithm which can be regarded as variational generalization of standard Kalman ﬁltering. The variational Kalman ﬁlter is based on two novel lower bounds that enable us to use a non-degenerate distribution over the adaptation rate. An extensive empirical evaluation demonstrates that the proposed method is capable of infering competitive classiﬁers both in stationary and non-stationary environments. Although we focus on classiﬁcation, the algorithm is easily extended to other generalized nonlinear models.

6 0.73060226 52 nips-2002-Cluster Kernels for Semi-Supervised Learning

7 0.72974718 10 nips-2002-A Model for Learning Variance Components of Natural Images

8 0.72951055 37 nips-2002-Automatic Derivation of Statistical Algorithms: The EM Family and Beyond

9 0.72384477 88 nips-2002-Feature Selection and Classification on Matrix Data: From Large Margins to Small Covering Numbers

10 0.72377002 53 nips-2002-Clustering with the Fisher Score

11 0.72280586 27 nips-2002-An Impossibility Theorem for Clustering

12 0.72134483 2 nips-2002-A Bilinear Model for Sparse Coding

13 0.72009563 31 nips-2002-Application of Variational Bayesian Approach to Speech Recognition

14 0.71940041 169 nips-2002-Real-Time Particle Filters

15 0.71933049 24 nips-2002-Adaptive Scaling for Feature Selection in SVMs

16 0.71907711 132 nips-2002-Learning to Detect Natural Image Boundaries Using Brightness and Texture

17 0.71810013 203 nips-2002-Using Tarjan's Red Rule for Fast Dependency Tree Construction

18 0.71725428 124 nips-2002-Learning Graphical Models with Mercer Kernels

19 0.71546072 46 nips-2002-Boosting Density Estimation

20 0.71465737 74 nips-2002-Dynamic Structure Super-Resolution