nips nips2000 nips2000-44 knowledge-graph by maker-knowledge-mining

44 nips-2000-Efficient Learning of Linear Perceptrons

Source: pdf

Author: Shai Ben-David, Hans-Ulrich Simon

Abstract: We consider the existence of efficient algorithms for learning the class of half-spaces in ~n in the agnostic learning model (Le., making no prior assumptions on the example-generating distribution). The resulting combinatorial problem - finding the best agreement half-space over an input sample - is NP hard to approximate to within some constant factor. We suggest a way to circumvent this theoretical bound by introducing a new measure of success for such algorithms. An algorithm is IL-margin successful if the agreement ratio of the half-space it outputs is as good as that of any half-space once training points that are inside the IL-margins of its separating hyper-plane are disregarded. We prove crisp computational complexity results with respect to this success measure: On one hand, for every positive IL, there exist efficient (poly-time) IL-margin successful learning algorithms. On the other hand, we prove that unless P=NP, there is no algorithm that runs in time polynomial in the sample size and in 1/ IL that is IL-margin successful for all IL> O. 1

Reference: text

Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 de Abstract We consider the existence of efficient algorithms for learning the class of half-spaces in ~n in the agnostic learning model (Le. [sent-6, score-0.249]

2 The resulting combinatorial problem - finding the best agreement half-space over an input sample - is NP hard to approximate to within some constant factor. [sent-8, score-0.536]

3 We suggest a way to circumvent this theoretical bound by introducing a new measure of success for such algorithms. [sent-9, score-0.298]

4 An algorithm is IL-margin successful if the agreement ratio of the half-space it outputs is as good as that of any half-space once training points that are inside the IL-margins of its separating hyper-plane are disregarded. [sent-10, score-0.71]

5 We prove crisp computational complexity results with respect to this success measure: On one hand, for every positive IL, there exist efficient (poly-time) IL-margin successful learning algorithms. [sent-11, score-0.724]

6 On the other hand, we prove that unless P=NP, there is no algorithm that runs in time polynomial in the sample size and in 1/ IL that is IL-margin successful for all IL> O. [sent-12, score-0.665]

7 1 Introduction We consider the computational complexity of learning linear perceptrons for arbitrary (Le. [sent-13, score-0.144]

8 While there are quite a few perceptron learning algorithms that are computationally efficient on separable input samples, it is clear that 'real-life' data sets are usually not linearly separable. [sent-15, score-0.37]

9 a half-space) that maximizes the number of correctly classified points for an arbitrary input labeled sample is known to be NP-hard. [sent-18, score-0.727]

10 Furthermore, even the task of finding a half-space whose success rate on the sample is within some constant ratio of an optimal one is NP-hard [1]. [sent-19, score-0.518]

11 A possible way around this problem is offered by the support vector machines paradigm (SVM) . [sent-20, score-0.079]

12 While sticking with the basic empirical risk minimization principle, we propose to replace the worst-case-performance analysis by an alternative measure of success. [sent-24, score-0.065]

13 The common definition of the approximation ratio of an algorithm, requires the profit of an algorithm to remain within some fixed ratio from that of an optimal solution for all inputs, we allow the relative quality of our algorithm to vary between different inputs. [sent-25, score-0.791]

14 For a given input sample, the number of points that the algorithm's output half-space should classify correctly relates not only to the success rate of the best possible half-space, but also to the robustness of this rate to perturbations of the hyper-plane. [sent-26, score-0.773]

15 This new success requirement is intended to provide a formal measure that, while being achievable by efficient algorithms, retains a guaranteed quality of the output 'whenever possible'. [sent-27, score-0.394]

16 The new success measure depends on a margin parameter p,. [sent-28, score-0.381]

17 An algorithm is called p,-margin successful if, for any input labeled sample, it outputs a hypothesis half-space that classifies correctly as many sample points as any half-space can classify correctly with margin p, (that is, discounting points that are too close to the separating hyper-plane). [sent-29, score-1.777]

18 On the other hand, unless P=NP, no algorithm whose running time is polynomial in the sample size and dimension and in 1/ p, can be p,-margin successful for all p, > O. [sent-33, score-0.633]

19 Note, that by the hardness of approximating linear perceptrons result of [1] cited above, for p, = 0, p,-margin learning is NP hard (even NP-hard to approximate). [sent-34, score-0.131]

20 We conclude that the new success criterion for learning algorithms provides a rigorous success guarantee that captures the constraints imposed on perceptron learning by computational efficiency requirements. [sent-35, score-0.729]

21 It is well known by now that margins play an important role in the analysis of genera- lization performance (or sample complexity). [sent-36, score-0.26]

22 The results of this work demonstrate that a similar notion of margins is a significant component in the determination of the computational complexity of learning as well. [sent-37, score-0.273]

23 Due to lack of space, in this extended abstract we skip all the technical proofs. [sent-38, score-0.034]

24 2 Definition and Notation We shall be interested in the problem of finding a half-space that maximizes the agreement with a given labeled input data set. [sent-39, score-0.474]

25 , (Xm, 17m)} is finite labeled sample, that is, each Xi is a point in lRn and each 17i is a member of {+1, -I}. [sent-43, score-0.106]

26 A hyper-plane h(w, t), where w E lRn and t E lR, correctly classifies (X, 17) if sign( < wx > -t) = 17 where < wx > denotes the dot product of the vectors w and x. [sent-44, score-0.439]

27 We define the profit of h = h(w, t) on S as fi (hiS) pro t = l{(xi,17i): h correctly classifies (Xi, 17i)}1 lSI The goal of a Best Separating Hyper-plane algorithm is to find a pair (w, t) so that profit(h(w, t)IS) is as large as possible. [sent-45, score-0.838]

28 In the sequel, we refer to an input instance with parameter n as a n-dimensional input. [sent-46, score-0.185]

29 On top of the Best Separating Hyper-plane problem we shall also refer to the following combinatorial optimization problems: Best separating Homogeneous Hyper-plane (BSHH) - The same problem as BSH, except that the separating hyper-plane must be homogeneous, that is, t must be set to zero. [sent-47, score-0.738]

30 The restriction of BSHH to input points from sn-l, the unit sphere in lRn, is called Best Separating Hemisphere Problem (BSHem) in the sequel. [sent-48, score-0.287]

31 Densest Hemisphere (DHem) Inputs are of the form (n, P), where n 2: 1 and P is a list of (not necessarily different) points from sn-l - the unit sphere in lRn. [sent-49, score-0.208]

32 The problem is to find the Densest Hemisphere for P, that is, a weight vector wE lRn such that H+(w, 0) contains as many points from P as possible (accounting for their multiplicity in P). [sent-50, score-0.207]

33 Densest Open Ball (DOB) Inputs are of the form (n, P), where n 2: 1, and P is a list of points from lRn. [sent-51, score-0.164]

34 The problem is to find the Densest Open Ball of radius 1 for P, that is, a center z E lRn such that B(z, 1) contains as many points from P as possible (accounting for their multiplicity in P). [sent-52, score-0.207]

35 For the sake of our proofs, we shall also have to address the following well studied optimization problem: MAX-E2-SAT Inputs are of the form (n, C), where n 2: 1 and C is a collection of 2-clauses over n Boolean variables. [sent-53, score-0.194]

36 The problem is to find an assignment a E {O, l}n satisfying as many 2-clauses of C as possible. [sent-54, score-0.038]

37 More generally, a maximization problem defines for each input instance 1 a set of legal solutions, and for each (instance, legal-solution) pair (I, a), it defines profit (I, a) E lR+ - the profit of a on I. [sent-55, score-1.332]

38 For each maximization problem II and each input instance 1 for II, optrr (I) denotes the maximum profit that can be realized by a legal solution for I. [sent-56, score-0.941]

39 The profit realized by an algorithm A on input instance 1 is denoted by A(I). [sent-58, score-0.744]

40 The quantity opt (I) - A(I) opt (I) is called the relative error of algorithm A on input instance I. [sent-59, score-0.602]

41 A is called 0approximation algorithm for II, where 0 E R+, if its relative error on I is at most 0 for all input instances I. [sent-60, score-0.266]

42 1 The new notion of approximate optimization: JL-margin approximation As mentioned in the introduction, we shall discuss a variant of the above common notion of approximation for the best separating hyper-plane problem (as well as for the other geometric maximization problems listed above). [sent-62, score-0.914]

43 The idea behind this new notion, that we term 'JL-margin approximation', is that the required approximation rate varies with the structure of the input sample. [sent-63, score-0.162]

44 When there exist optimal solutions that are 'stable', in the sense that minor variations to these solutions will not effect their cost, then we require a high approximation ratio. [sent-64, score-0.183]

45 On the other hand, when all optimal solutions are 'unstable' then we settle for lower approximation ratios. [sent-65, score-0.188]

46 The following definitions focus on separation problems, but extend to densest set problems in the obvious way. [sent-66, score-0.272]

47 = Un li n , where each li n is a collection of subsets of Rn, and a parameter JL ~ 0, • A margin function is a function M : U n(li n x Rn) r-+ R+. [sent-69, score-0.259]

48 That is, given a hypothesis h C Rn and a point x E Rn, M (h, x) is a non-negative real number - the margin of x w. [sent-70, score-0.153]

49 if profit(hIS) > • an algorithm A is JL-successful for 11. [sent-80, score-0.07]

50 if for every finite n-dimensional input S it outputs A(S) E li n which is a JL-margin approximation for S w. [sent-81, score-0.315]

51 • Given any of the geometric maximization problem listed above, II, its JLrelaxation is the problem of finding, for each input instance of II a JL-margin approximation. [sent-86, score-0.462]

52 For a given parameter JL > 0, we denote the JL-relaxation of a problem II by II[JL]. [sent-87, score-0.038]

53 1, - margin successful learning algorithms Our Hyper-plane learning algorithm is based on the following result of Ben-David, Eiron and Simon [2] Theorem 3. [sent-89, score-0.424]

54 1 For every (constant) JL > 0, there exists a JL-margin successful polynomial time algorithm AI-' for the Densest Open Ball Problem. [sent-90, score-0.487]

55 L-successful algorithm for Densest Open Balls implies the existence of J. [sent-92, score-0.116]

56 L-successful algorithms for Densest Hemispheres and Best Separating Homogeneous Hyper-planes. [sent-93, score-0.047]

57 Towards this end we need notions of reductions between combinatorial optimization problems. [sent-94, score-0.109]

58 The first definition, of a cost preserving polynomial reduction, is standard, whereas the second definition is tailored for our notion of J. [sent-95, score-0.362]

59 Once this, somewhat technical, preliminary stage is over we shall describe our learning algorithms and prove their performance guarantees. [sent-97, score-0.238]

60 A cost preserving polynomial reduction from II to II', written as II:S~~III' consists of the following components: • a polynomial time computable mapping which maps input instances of II to input instances of II', so that whenever I is mapped to I', opt( I') ~ opt( I). [sent-100, score-0.874]

61 • for each I, a polynomial time computable mapping which maps each legal solutions (J' for I' to a legal solution (J for I having the same profit that (J' • The following result is evident: Lemma 3. [sent-101, score-0.999]

62 3 If II:S~~III' and there exists a polynomial time c5-approximation algorithm for II', then there exists a polynomial time c5-approximation algorithm for II. [sent-102, score-0.566]

similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('profit', 0.409), ('densest', 0.272), ('success', 0.266), ('separating', 0.219), ('polynomial', 0.176), ('correctly', 0.171), ('opt', 0.158), ('classifies', 0.154), ('successful', 0.154), ('sample', 0.146), ('lrn', 0.143), ('legal', 0.136), ('jl', 0.135), ('bsh', 0.119), ('shall', 0.115), ('margins', 0.114), ('points', 0.107), ('labeled', 0.106), ('input', 0.105), ('np', 0.097), ('maximization', 0.093), ('hemisphere', 0.093), ('classified', 0.092), ('ball', 0.086), ('ii', 0.083), ('margin', 0.083), ('simon', 0.08), ('realized', 0.08), ('instance', 0.08), ('perceptron', 0.08), ('rn', 0.08), ('bochum', 0.079), ('bshh', 0.079), ('computable', 0.079), ('crisp', 0.079), ('notion', 0.077), ('best', 0.073), ('homogeneous', 0.072), ('li', 0.071), ('hypothesis', 0.07), ('algorithm', 0.07), ('settle', 0.068), ('shai', 0.068), ('agreement', 0.066), ('combinatorial', 0.064), ('open', 0.063), ('solutions', 0.063), ('ratio', 0.062), ('perceptrons', 0.062), ('multiplicity', 0.062), ('listed', 0.062), ('definition', 0.061), ('instances', 0.06), ('approximation', 0.057), ('list', 0.057), ('wx', 0.057), ('efficient', 0.052), ('inputs', 0.051), ('separable', 0.051), ('classify', 0.051), ('every', 0.05), ('hand', 0.049), ('accounting', 0.048), ('lsi', 0.048), ('preserving', 0.048), ('algorithms', 0.047), ('complexity', 0.047), ('existence', 0.046), ('geometric', 0.046), ('lr', 0.046), ('dimension', 0.046), ('svm', 0.045), ('optimization', 0.045), ('sphere', 0.044), ('requirement', 0.044), ('finding', 0.044), ('prove', 0.041), ('unless', 0.041), ('paradigm', 0.041), ('il', 0.04), ('problem', 0.038), ('runs', 0.037), ('exists', 0.037), ('learning', 0.035), ('collection', 0.034), ('entails', 0.034), ('separator', 0.034), ('agnostic', 0.034), ('balls', 0.034), ('hardness', 0.034), ('pro', 0.034), ('skip', 0.034), ('replace', 0.033), ('whenever', 0.033), ('euclidean', 0.032), ('measure', 0.032), ('reduction', 0.032), ('outputs', 0.032), ('called', 0.031), ('defines', 0.031)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.99999982 44 nips-2000-Efficient Learning of Linear Perceptrons

Author: Shai Ben-David, Hans-Ulrich Simon

2 0.11603813 58 nips-2000-From Margin to Sparsity

Author: Thore Graepel, Ralf Herbrich, Robert C. Williamson

Abstract: We present an improvement of Novikoff's perceptron convergence theorem. Reinterpreting this mistake bound as a margin dependent sparsity guarantee allows us to give a PAC-style generalisation error bound for the classifier learned by the perceptron learning algorithm. The bound value crucially depends on the margin a support vector machine would achieve on the same data set using the same kernel. Ironically, the bound yields better guarantees than are currently available for the support vector solution itself. 1

3 0.11231723 9 nips-2000-A PAC-Bayesian Margin Bound for Linear Classifiers: Why SVMs work

Author: Ralf Herbrich, Thore Graepel

Abstract: We present a bound on the generalisation error of linear classifiers in terms of a refined margin quantity on the training set. The result is obtained in a PAC- Bayesian framework and is based on geometrical arguments in the space of linear classifiers. The new bound constitutes an exponential improvement of the so far tightest margin bound by Shawe-Taylor et al. [8] and scales logarithmically in the inverse margin. Even in the case of less training examples than input dimensions sufficiently large margins lead to non-trivial bound values and - for maximum margins - to a vanishing complexity term. Furthermore, the classical margin is too coarse a measure for the essential quantity that controls the generalisation error: the volume ratio between the whole hypothesis space and the subset of consistent hypotheses. The practical relevance of the result lies in the fact that the well-known support vector machine is optimal w.r.t. the new bound only if the feature vectors are all of the same length. As a consequence we recommend to use SVMs on normalised feature vectors only - a recommendation that is well supported by our numerical experiments on two benchmark data sets. 1

4 0.10310751 18 nips-2000-Active Support Vector Machine Classification

Author: Olvi L. Mangasarian, David R. Musicant

Abstract: An active set strategy is applied to the dual of a simple reformulation of the standard quadratic program of a linear support vector machine. This application generates a fast new dual algorithm that consists of solving a finite number of linear equations, with a typically large dimensionality equal to the number of points to be classified. However, by making novel use of the Sherman-MorrisonWoodbury formula , a much smaller matrix of the order of the original input space is inverted at each step. Thus, a problem with a 32-dimensional input space and 7 million points required inverting positive definite symmetric matrices of size 33 x 33 with a total running time of 96 minutes on a 400 MHz Pentium II. The algorithm requires no specialized quadratic or linear programming code, but merely a linear equation solver which is publicly available. 1

5 0.10215229 145 nips-2000-Weak Learners and Improved Rates of Convergence in Boosting

Author: Shie Mannor, Ron Meir

Abstract: The problem of constructing weak classifiers for boosting algorithms is studied. We present an algorithm that produces a linear classifier that is guaranteed to achieve an error better than random guessing for any distribution on the data. While this weak learner is not useful for learning in general, we show that under reasonable conditions on the distribution it yields an effective weak learner for one-dimensional problems. Preliminary simulations suggest that similar behavior can be expected in higher dimensions, a result which is corroborated by some recent theoretical bounds. Additionally, we provide improved convergence rate bounds for the generalization error in situations where the empirical error can be made small, which is exactly the situation that occurs if weak learners with guaranteed performance that is better than random guessing can be established. 1

6 0.10142462 7 nips-2000-A New Approximate Maximal Margin Classification Algorithm

7 0.09852583 37 nips-2000-Convergence of Large Margin Separable Linear Classification

8 0.090102047 75 nips-2000-Large Scale Bayes Point Machines

9 0.089938648 4 nips-2000-A Linear Programming Approach to Novelty Detection

10 0.083434731 54 nips-2000-Feature Selection for SVMs

11 0.079599686 111 nips-2000-Regularized Winnow Methods

12 0.072780736 21 nips-2000-Algorithmic Stability and Generalization Performance

13 0.069085315 74 nips-2000-Kernel Expansions with Unlabeled Examples

14 0.067778796 12 nips-2000-A Support Vector Method for Clustering

15 0.06769903 128 nips-2000-Support Vector Novelty Detection Applied to Jet Engine Vibration Spectra

16 0.059567828 110 nips-2000-Regularization with Dot-Product Kernels

17 0.059234828 17 nips-2000-Active Learning for Parameter Estimation in Bayesian Networks

18 0.057336047 68 nips-2000-Improved Output Coding for Classification Using Continuous Relaxation

19 0.054919891 5 nips-2000-A Mathematical Programming Approach to the Kernel Fisher Algorithm

20 0.053739823 120 nips-2000-Sparse Greedy Gaussian Process Regression

similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.204), (1, 0.139), (2, -0.076), (3, -0.02), (4, -0.056), (5, -0.075), (6, 0.037), (7, 0.003), (8, -0.027), (9, 0.078), (10, -0.079), (11, 0.029), (12, 0.022), (13, -0.021), (14, 0.08), (15, -0.094), (16, -0.078), (17, -0.041), (18, -0.039), (19, -0.016), (20, -0.019), (21, -0.04), (22, -0.07), (23, 0.031), (24, -0.075), (25, -0.006), (26, 0.026), (27, -0.059), (28, -0.075), (29, -0.062), (30, 0.088), (31, -0.038), (32, 0.077), (33, 0.086), (34, -0.05), (35, -0.108), (36, -0.069), (37, -0.045), (38, -0.043), (39, -0.105), (40, -0.038), (41, 0.033), (42, -0.109), (43, 0.053), (44, -0.095), (45, -0.154), (46, -0.089), (47, -0.017), (48, 0.162), (49, 0.14)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.95396811 44 nips-2000-Efficient Learning of Linear Perceptrons

Author: Shai Ben-David, Hans-Ulrich Simon

2 0.61295676 18 nips-2000-Active Support Vector Machine Classification

Author: Olvi L. Mangasarian, David R. Musicant

3 0.59369153 7 nips-2000-A New Approximate Maximal Margin Classification Algorithm

Author: Claudio Gentile

Abstract: A new incremental learning algorithm is described which approximates the maximal margin hyperplane w.r.t. norm p ~ 2 for a set of linearly separable data. Our algorithm, called ALMAp (Approximate Large Margin algorithm w.r.t. norm p), takes 0 ((P~21;;2) corrections to separate the data with p-norm margin larger than (1 - 0:) ,,(, where,,( is the p-norm margin of the data and X is a bound on the p-norm of the instances. ALMAp avoids quadratic (or higher-order) programming methods. It is very easy to implement and is as fast as on-line algorithms, such as Rosenblatt's perceptron. We report on some experiments comparing ALMAp to two incremental algorithms: Perceptron and Li and Long's ROMMA. Our algorithm seems to perform quite better than both. The accuracy levels achieved by ALMAp are slightly inferior to those obtained by Support vector Machines (SVMs). On the other hand, ALMAp is quite faster and easier to implement than standard SVMs training algorithms.

4 0.53420895 111 nips-2000-Regularized Winnow Methods

Author: Tong Zhang

Abstract: In theory, the Winnow multiplicative update has certain advantages over the Perceptron additive update when there are many irrelevant attributes. Recently, there has been much effort on enhancing the Perceptron algorithm by using regularization, leading to a class of linear classification methods called support vector machines. Similarly, it is also possible to apply the regularization idea to the Winnow algorithm, which gives methods we call regularized Winnows. We show that the resulting methods compare with the basic Winnows in a similar way that a support vector machine compares with the Perceptron. We investigate algorithmic issues and learning properties of the derived methods. Some experimental results will also be provided to illustrate different methods.

5 0.5206812 37 nips-2000-Convergence of Large Margin Separable Linear Classification

Author: Tong Zhang

Abstract: Large margin linear classification methods have been successfully applied to many applications. For a linearly separable problem, it is known that under appropriate assumptions, the expected misclassification error of the computed

6 0.44512185 58 nips-2000-From Margin to Sparsity

7 0.43981829 148 nips-2000-`N-Body' Problems in Statistical Learning

8 0.41294408 17 nips-2000-Active Learning for Parameter Estimation in Bayesian Networks

9 0.40535039 144 nips-2000-Vicinal Risk Minimization

10 0.39827552 145 nips-2000-Weak Learners and Improved Rates of Convergence in Boosting

11 0.38798919 54 nips-2000-Feature Selection for SVMs

12 0.38119471 4 nips-2000-A Linear Programming Approach to Novelty Detection

13 0.37568852 9 nips-2000-A PAC-Bayesian Margin Bound for Linear Classifiers: Why SVMs work

14 0.36382258 86 nips-2000-Model Complexity, Goodness of Fit and Diminishing Returns

15 0.35938972 5 nips-2000-A Mathematical Programming Approach to the Kernel Fisher Algorithm

16 0.34751698 12 nips-2000-A Support Vector Method for Clustering

17 0.33664143 138 nips-2000-The Use of Classifiers in Sequential Inference

18 0.32458475 139 nips-2000-The Use of MDL to Select among Computational Models of Cognition

19 0.32014266 70 nips-2000-Incremental and Decremental Support Vector Machine Learning

20 0.31369111 128 nips-2000-Support Vector Novelty Detection Applied to Jet Engine Vibration Spectra

similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(10, 0.035), (17, 0.097), (33, 0.067), (48, 0.365), (55, 0.016), (62, 0.038), (65, 0.023), (67, 0.081), (76, 0.072), (81, 0.02), (90, 0.075), (97, 0.017)]

similar papers list:

simIndex simValue paperId paperTitle

1 0.87394869 115 nips-2000-Sequentially Fitting ``Inclusive'' Trees for Inference in Noisy-OR Networks

Author: Brendan J. Frey, Relu Patrascu, Tommi Jaakkola, Jodi Moran

Abstract: An important class of problems can be cast as inference in noisyOR Bayesian networks, where the binary state of each variable is a logical OR of noisy versions of the states of the variable's parents. For example, in medical diagnosis, the presence of a symptom can be expressed as a noisy-OR of the diseases that may cause the symptom - on some occasions, a disease may fail to activate the symptom. Inference in richly-connected noisy-OR networks is intractable, but approximate methods (e .g., variational techniques) are showing increasing promise as practical solutions. One problem with most approximations is that they tend to concentrate on a relatively small number of modes in the true posterior, ignoring other plausible configurations of the hidden variables. We introduce a new sequential variational method for bipartite noisyOR networks, that favors including all modes of the true posterior and models the posterior distribution as a tree. We compare this method with other approximations using an ensemble of networks with network statistics that are comparable to the QMR-DT medical diagnostic network. 1 Inclusive variational approximations Approximate algorithms for probabilistic inference are gaining in popularity and are now even being incorporated into VLSI hardware (T. Richardson, personal communication). Approximate methods include variational techniques (Ghahramani and Jordan 1997; Saul et al. 1996; Frey and Hinton 1999; Jordan et al. 1999), local probability propagation (Gallager 1963; Pearl 1988; Frey 1998; MacKay 1999a; Freeman and Weiss 2001) and Markov chain Monte Carlo (Neal 1993; MacKay 1999b). Many algorithms have been proposed in each of these classes. One problem that most of the above algorithms suffer from is a tendency to concentrate on a relatively small number of modes of the target distribution (the distribution being approximated). In the case of medical diagnosis, different modes correspond to different explanations of the symptoms. Markov chain Monte Carlo methods are usually guaranteed to eventually sample from all the modes, but this may take an extremely long time, even when tempered transitions (Neal 1996) are (a) ,,

same-paper 2 0.81316215 44 nips-2000-Efficient Learning of Linear Perceptrons

Author: Shai Ben-David, Hans-Ulrich Simon

3 0.41986686 7 nips-2000-A New Approximate Maximal Margin Classification Algorithm

Author: Claudio Gentile

4 0.4151836 21 nips-2000-Algorithmic Stability and Generalization Performance

Author: Olivier Bousquet, Andr茅 Elisseeff

Abstract: We present a novel way of obtaining PAC-style bounds on the generalization error of learning algorithms, explicitly using their stability properties. A stable learner is one for which the learned solution does not change much with small changes in the training set. The bounds we obtain do not depend on any measure of the complexity of the hypothesis space (e.g. VC dimension) but rather depend on how the learning algorithm searches this space, and can thus be applied even when the VC dimension is infinite. We demonstrate that regularization networks possess the required stability property and apply our method to obtain new bounds on their generalization performance. 1

5 0.4131985 111 nips-2000-Regularized Winnow Methods

Author: Tong Zhang

6 0.41215923 74 nips-2000-Kernel Expansions with Unlabeled Examples

7 0.40736169 119 nips-2000-Some New Bounds on the Generalization Error of Combined Classifiers

8 0.4042263 37 nips-2000-Convergence of Large Margin Separable Linear Classification

9 0.40222558 134 nips-2000-The Kernel Trick for Distances

10 0.40150461 52 nips-2000-Fast Training of Support Vector Classifiers

11 0.40098608 9 nips-2000-A PAC-Bayesian Margin Bound for Linear Classifiers: Why SVMs work

12 0.3997218 4 nips-2000-A Linear Programming Approach to Novelty Detection

13 0.3976396 69 nips-2000-Incorporating Second-Order Functional Knowledge for Better Option Pricing

14 0.39406618 104 nips-2000-Processing of Time Series by Neural Circuits with Biologically Realistic Synaptic Dynamics

15 0.39200276 146 nips-2000-What Can a Single Neuron Compute?

16 0.39190143 122 nips-2000-Sparse Representation for Gaussian Process Models

17 0.39139107 22 nips-2000-Algorithms for Non-negative Matrix Factorization

18 0.39100182 75 nips-2000-Large Scale Bayes Point Machines

19 0.3894532 95 nips-2000-On a Connection between Kernel PCA and Metric Multidimensional Scaling

20 0.38820377 64 nips-2000-High-temperature Expansions for Learning Models of Nonnegative Data