nips nips2000 nips2000-23 knowledge-graph by maker-knowledge-mining

23 nips-2000-An Adaptive Metric Machine for Pattern Classification

Source: pdf

Author: Carlotta Domeniconi, Jing Peng, Dimitrios Gunopulos

Abstract: Nearest neighbor classification assumes locally constant class conditional probabilities. This assumption becomes invalid in high dimensions with finite samples due to the curse of dimensionality. Severe bias can be introduced under these conditions when using the nearest neighbor rule. We propose a locally adaptive nearest neighbor classification method to try to minimize bias. We use a Chi-squared distance analysis to compute a flexible metric for producing neighborhoods that are elongated along less relevant feature dimensions and constricted along most influential ones. As a result, the class conditional probabilities tend to be smoother in the modified neighborhoods, whereby better classification performance can be achieved. The efficacy of our method is validated and compared against other techniques using a variety of real world data. 1

Reference: text

Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 edu Abstract Nearest neighbor classification assumes locally constant class conditional probabilities. [sent-7, score-0.426]

2 This assumption becomes invalid in high dimensions with finite samples due to the curse of dimensionality. [sent-8, score-0.125]

3 Severe bias can be introduced under these conditions when using the nearest neighbor rule. [sent-9, score-0.337]

4 We propose a locally adaptive nearest neighbor classification method to try to minimize bias. [sent-10, score-0.606]

5 We use a Chi-squared distance analysis to compute a flexible metric for producing neighborhoods that are elongated along less relevant feature dimensions and constricted along most influential ones. [sent-11, score-0.931]

6 As a result, the class conditional probabilities tend to be smoother in the modified neighborhoods, whereby better classification performance can be achieved. [sent-12, score-0.348]

7 The efficacy of our method is validated and compared against other techniques using a variety of real world data. [sent-13, score-0.056]

8 1 Introduction In classification, a feature vector x = (Xl,···, Xqy E lRq, representing an object, is assumed to be in one of J classes {i}{=l' and the objective is to build classifier machines that assign x to the correct class from a given set of N training samples. [sent-14, score-0.204]

9 The K nearest neighbor (NN) classification method [3, 5, 7, 8, 9] is a simple and appealing approach to this problem. [sent-15, score-0.46]

10 Such a method produces continuous and overlapping, rather than fixed, neighborhoods and uses a different neighborhood for each individual query so that all points in the neighborhood are close to the query, to the extent possible. [sent-16, score-0.686]

11 In addition, it has been shown [4, 6] that the one NN rule has asymptotic error rate that is at most twice the Bayes error rate, independent of the distance metric used. [sent-17, score-0.41]

12 The NN rule becomes less appealing in finite training samples, however. [sent-18, score-0.108]

13 Severe bias can be introduced in the NN rule in a high dimensional input feature space with finite samples. [sent-20, score-0.212]

14 As such, the choice of a distance measure becomes crucial in determining the outcome of nearest neighbor classification. [sent-21, score-0.455]

15 The commonly used Euclidean distance measure, while simple computationally, implies that the input space is isotropic or homogeneous. [sent-22, score-0.15]

16 However, the assumption for isotropy is often invalid and generally undesirable in many practical applications. [sent-23, score-0.077]

17 In general, distance computation does not vary with equal strength or in the same proportion in all directions in the feature space emanating from the input query. [sent-24, score-0.27]

18 Capturing such information, therefore, is of great importance to any classification procedure in high dimensional settings. [sent-25, score-0.091]

19 In this paper we propose an adaptive metric classification method to try to minimize bias in high dimensions. [sent-26, score-0.414]

20 We estimate a flexible metric for computing neighborhoods based on Chi-squared distance analysis. [sent-27, score-0.473]

21 The resulting neighborhoods are highly adaptive to query locations. [sent-28, score-0.355]

22 Moreover, the neighborhoods are elongated along less relevant feature dimensions and constricted along most influential ones. [sent-29, score-0.66]

23 As a result, the class conditional probabilities tend to be constant in the modified neighborhoods, whereby better classification performance can be obtained. [sent-30, score-0.348]

24 Let Xo be the test point whose class membership we are predicting. [sent-32, score-0.07]

25 In the one NN classification rule, a single nearest neighbor x is found according to a distance metric D(x, xo). [sent-33, score-0.622]

26 Let p(jlx) be the class conditional probability at point x. [sent-34, score-0.122]

27 Consider the weighted Chi-squared distance [8, 11] D( )= X,Xo ~ f=:. [sent-35, score-0.119]

28 [Pr(jlx) - Pr(jlxoW , Pr(jlxo) (1) which measures the distance between Xo and the point x, in terms of the difference between the class posterior probabilities at the two points. [sent-36, score-0.277]

29 Small D(x, xo) indicates that the classification error rate will be close to the asymptotic error rate for one nearest neighbor. [sent-37, score-0.424]

30 In general, this can be achieved when Pr(jlx) = Pr(jlxo), which states that if Pr(jlx) can be sufficiently well approximated at Xo, the asymptotic 1-NN error rate might result in finite sample settings. [sent-38, score-0.146]

31 Equation (1) computes the distance between the true and estimated posteriors. [sent-39, score-0.152]

32 Now, imagine we replace Pr(jlxo) with a quantity that attempts to predict Pr(jlx) under the constraint that the quantity is conditioned at a location along a particular feature dimension. [sent-40, score-0.267]

33 Then, the Chi-squared distance (1) tells us the extent to which that dimension can be relied on to predict Pr(jlx). [sent-41, score-0.232]

34 Thus, Equation (1) provides us with a foundation upon which to develop a theory of feature relevance in the context of pattern classification. [sent-42, score-0.24]

35 Therefore, we can compute the conditional expectation of p(jlx), denoted by Pr(jlxi = z), given that Xi assumes value z, where Xi represents the ith component of x. [sent-45, score-0.078]

36 Here p(XIXi = z) is the conditional density of the other input variables. [sent-47, score-0.083]

37 Pr(~Xi = Zi)]2 Pr(J IXi - Zi) (2) ri(x) represents the ability offeature i to predict the Pr(jlx)s at Xi = Zi. [sent-49, score-0.04]

38 The closer Pr(jlxi = Zi) is to Pr(jlx), the more information feature i carries for predicting the class posterior probabilities locally at x. [sent-50, score-0.282]

39 We can now define a measure of feature relevance for Xo as 1 fi(XO) = K ri(z), L zEN(xo) (3) where N(xo) denotes the neighborhood of Xo containing the K nearest training points, according to a given metric. [sent-51, score-0.608]

40 ri measures how well on average the class posterior probabilities can be approximated along input feature i within a local neighborhood of Xo. [sent-52, score-0.724]

41 Small ri implies that the class posterior probabilities will be well captured along dimension i in the vicinity of Xo. [sent-53, score-0.474]

42 Note that ri(xo) is a function of both the test point Xo and the dimension i, thereby making ri(xo) a local relevance measure. [sent-54, score-0.272]

43 The relative relevance, as a weighting scheme, can then be given by the following exponential weighting scheme q Wi(XO) = exp(cRi(XO))/ L exp(cRl(XO)) (4) 1=1 where c is a parameter that can be chosen to maximize (minimize) the influence of ri on Wi, and Ri(X) = maxj rj(x) - ri(x). [sent-55, score-0.401]

44 When c = 0 we have Wi = l/q, thereby ignoring any difference between the ri's. [sent-56, score-0.035]

45 On the other hand, when c is large a change in ri will be exponentially reflected in Wi. [sent-57, score-0.179]

46 The exponential weighting is more sensitive to changes in local feature relevance (3) and gives rise to better performance improvement. [sent-59, score-0.37]

47 Thus, (4) can be used as weights associated with features for weighted distance computation D(x, y) = V'L,r=1 Wi(Xi - Yi)2. [sent-60, score-0.119]

48 These weights enable the neighborhood to elongate less important feature dimensions, and, at the same time, to constrict the most influential ones. [sent-61, score-0.317]

49 Note that the technique is query-based because weightings depend on the query [1]. [sent-62, score-0.078]

50 3 Estimation Since both PrUlx) and Pr(jlxi = Zi) in (3) are unknown, we must estimate them using the training data {xn, Yn};;=1 in order for the relevance measure (3) to be useful in practice. [sent-63, score-0.232]

51 The quantity Pr(jlx) is estimated by considering a neighborhood Nl (x) centered at x: (5) where 1(·) is an indicator function such that it returns 1 when its argument is true, and 0 otherwise. [sent-68, score-0.276]

52 To compute PrUlxi = z) = E[PrUlx)lxi = Z], we introduce a dummy variable gj such that if Y = j, then gj Ix = 1, otherwise gj Ix = 0, where j = 1,···, J. [sent-69, score-0.461]

53 However, since there may not be any data at Xi = z, the data from the neighborhood of x along dimension i are used to estimate E[gj IXi = z], a strategy suggested in [7]. [sent-71, score-0.345]

54 In detail, by noticing gj = l(y = j) the estimate can be computed from P r (. [sent-72, score-0.144]

55 Using the estimates in (5) and in (6), we obtain an empirical measure of the relevance (3) for each input variable i. [sent-74, score-0.232]

56 In all the experiments, the features are first normalized over the training data to have zero mean and unit variance, and the test data are normalized using the corresponding training mean and variance. [sent-77, score-0.062]

57 Procedural parameters for each method were determined empirically through cross-validation. [sent-78, score-0.028]

58 The data sets used were taken from the VCI Machine Learning Database Repository [10], except for the unreleased image data set. [sent-154, score-0.062]

59 This data set consists of q = 4 measurements made on each of N = 100 iris plants of J = 2 species; 2. [sent-157, score-0.206]

60 This data set consists of q = 60 frequency measurements made on each of N = 208 data of J = 2 classes ("mines" and "rocks"); 3. [sent-159, score-0.215]

61 This example has q = 10 measurements and 11 classes. [sent-161, score-0.047]

62 This data set consists of q = 9 chemical attributes measured for each of N = 214 data of J = 6 classes; 5. [sent-164, score-0.181]

63 This data set consists of 40 texture images that are manually classified into 15 classes. [sent-166, score-0.132]

64 The number of images in each class varies from 16 to 80. [sent-167, score-0.122]

65 The images in this database are represented by q = 16 dimensional feature vectors; 6. [sent-168, score-0.169]

66 This data set consists of images that were drawn randomly from a database of 7 outdoor images. [sent-170, score-0.172]

67 Thus, there are N = 2,310 images in the database. [sent-172, score-0.052]

68 These images are represented by q = 19 real valued attributes; 7. [sent-173, score-0.08]

69 This data set consists of q = 16 numerical attributes and J = 26 classes; 8. [sent-175, score-0.18]

70 This data set consists of 345 instances, represented by q = 6 numerical attributes, and J = 2 classes; and 9. [sent-177, score-0.11]

71 This example has 32 instances having q = 56 numerical features and J = 3 classes. [sent-179, score-0.064]

72 Results: Table 1 shows the (cross-validated) error rates for the eight methods under consideration on the nine real data sets. [sent-180, score-0.091]

73 Note that the average error rates 4 J: ~ I z ~ « ~ z ~ 1 Z Z :. [sent-181, score-0.032]

similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('jlx', 0.47), ('pr', 0.343), ('xo', 0.289), ('neighborhoods', 0.202), ('ri', 0.179), ('neighborhood', 0.172), ('relevance', 0.163), ('nearest', 0.158), ('gj', 0.144), ('neighbor', 0.14), ('jlxi', 0.134), ('machete', 0.134), ('nn', 0.129), ('distance', 0.119), ('metric', 0.114), ('adamenn', 0.101), ('dann', 0.101), ('jlxo', 0.101), ('prulx', 0.101), ('bo', 0.097), ('classification', 0.091), ('zi', 0.09), ('iris', 0.079), ('query', 0.078), ('feature', 0.077), ('adaptive', 0.075), ('xii', 0.073), ('along', 0.072), ('class', 0.07), ('attributes', 0.07), ('wi', 0.069), ('influential', 0.068), ('carlotta', 0.067), ('constricted', 0.067), ('liver', 0.067), ('lung', 0.067), ('prulxi', 0.067), ('scythe', 0.067), ('seg', 0.067), ('weighting', 0.063), ('probabilities', 0.058), ('elongated', 0.058), ('classes', 0.057), ('sonar', 0.052), ('lxi', 0.052), ('vowel', 0.052), ('conditional', 0.052), ('images', 0.052), ('asymptotic', 0.051), ('consists', 0.049), ('yn', 0.049), ('ixi', 0.048), ('invalid', 0.048), ('whereby', 0.048), ('measurements', 0.047), ('locally', 0.047), ('xi', 0.045), ('dimensions', 0.044), ('proportion', 0.043), ('appealing', 0.043), ('severe', 0.043), ('letter', 0.041), ('database', 0.04), ('predict', 0.04), ('bias', 0.039), ('quantity', 0.039), ('dimension', 0.039), ('measure', 0.038), ('flexible', 0.038), ('glass', 0.036), ('local', 0.035), ('thereby', 0.035), ('extent', 0.034), ('instances', 0.034), ('minimize', 0.034), ('estimated', 0.033), ('scheme', 0.033), ('try', 0.033), ('finite', 0.033), ('error', 0.032), ('exponential', 0.032), ('rule', 0.032), ('centered', 0.032), ('data', 0.031), ('input', 0.031), ('influence', 0.031), ('rate', 0.03), ('posterior', 0.03), ('numerical', 0.03), ('tend', 0.029), ('dummy', 0.029), ('undesirable', 0.029), ('five', 0.028), ('method', 0.028), ('real', 0.028), ('euclidean', 0.028), ('iterations', 0.027), ('ix', 0.027), ('assumes', 0.026), ('vicinity', 0.026)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.99999994 23 nips-2000-An Adaptive Metric Machine for Pattern Classification

Author: Carlotta Domeniconi, Jing Peng, Dimitrios Gunopulos

2 0.11359821 134 nips-2000-The Kernel Trick for Distances

Author: Bernhard Schölkopf

Abstract: A method is described which, like the kernel trick in support vector machines (SVMs), lets us generalize distance-based algorithms to operate in feature spaces, usually nonlinearly related to the input space. This is done by identifying a class of kernels which can be represented as norm-based distances in Hilbert spaces. It turns out that common kernel algorithms, such as SVMs and kernel PCA, are actually really distance based algorithms and can be run with that class of kernels, too. As well as providing a useful new insight into how these algorithms work, the present work can form the basis for conceiving new algorithms.

3 0.084624454 137 nips-2000-The Unscented Particle Filter

Author: Rudolph van der Merwe, Arnaud Doucet, Nando de Freitas, Eric A. Wan

Abstract: In this paper, we propose a new particle filter based on sequential importance sampling. The algorithm uses a bank of unscented filters to obtain the importance proposal distribution. This proposal has two very

4 0.083664365 19 nips-2000-Adaptive Object Representation with Hierarchically-Distributed Memory Sites

Author: Bosco S. Tjan

Abstract: Theories of object recognition often assume that only one representation scheme is used within one visual-processing pathway. Versatility of the visual system comes from having multiple visual-processing pathways, each specialized in a different category of objects. We propose a theoretically simpler alternative, capable of explaining the same set of data and more. A single primary visual-processing pathway, loosely modular, is assumed. Memory modules are attached to sites along this pathway. Object-identity decision is made independently at each site. A site's response time is a monotonic-decreasing function of its confidence regarding its decision. An observer's response is the first-arriving response from any site. The effective representation(s) of such a system, determined empirically, can appear to be specialized for different tasks and stimuli, consistent with recent clinical and functional-imaging findings. This, however, merely reflects a decision being made at its appropriate level of abstraction. The system itself is intrinsically flexible and adaptive.

5 0.081331864 73 nips-2000-Kernel-Based Reinforcement Learning in Average-Cost Problems: An Application to Optimal Portfolio Choice

Author: Dirk Ormoneit, Peter W. Glynn

Abstract: Many approaches to reinforcement learning combine neural networks or other parametric function approximators with a form of temporal-difference learning to estimate the value function of a Markov Decision Process. A significant disadvantage of those procedures is that the resulting learning algorithms are frequently unstable. In this work, we present a new, kernel-based approach to reinforcement learning which overcomes this difficulty and provably converges to a unique solution. By contrast to existing algorithms, our method can also be shown to be consistent in the sense that its costs converge to the optimal costs asymptotically. Our focus is on learning in an average-cost framework and on a practical application to the optimal portfolio choice problem. 1

6 0.073152505 71 nips-2000-Interactive Parts Model: An Application to Recognition of On-line Cursive Script

7 0.070305377 17 nips-2000-Active Learning for Parameter Estimation in Bayesian Networks

8 0.067213476 148 nips-2000-`N-Body' Problems in Statistical Learning

9 0.062308718 74 nips-2000-Kernel Expansions with Unlabeled Examples

10 0.058780935 35 nips-2000-Computing with Finite and Infinite Networks

11 0.058456413 20 nips-2000-Algebraic Information Geometry for Learning Machines with Singularities

12 0.057117425 38 nips-2000-Data Clustering by Markovian Relaxation and the Information Bottleneck Method

13 0.055371944 145 nips-2000-Weak Learners and Improved Rates of Convergence in Boosting

14 0.054582551 95 nips-2000-On a Connection between Kernel PCA and Metric Multidimensional Scaling

15 0.054412335 117 nips-2000-Shape Context: A New Descriptor for Shape Matching and Object Recognition

16 0.05349851 54 nips-2000-Feature Selection for SVMs

17 0.05174144 87 nips-2000-Modelling Spatial Recall, Mental Imagery and Neglect

18 0.051432863 80 nips-2000-Learning Switching Linear Models of Human Motion

19 0.051046189 107 nips-2000-Rate-coded Restricted Boltzmann Machines for Face Recognition

20 0.050967783 75 nips-2000-Large Scale Bayes Point Machines

similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.179), (1, 0.033), (2, 0.021), (3, 0.014), (4, -0.02), (5, 0.087), (6, 0.036), (7, 0.022), (8, 0.02), (9, 0.037), (10, 0.002), (11, 0.084), (12, 0.096), (13, 0.006), (14, -0.011), (15, -0.038), (16, 0.115), (17, 0.013), (18, 0.008), (19, 0.056), (20, -0.026), (21, -0.17), (22, 0.008), (23, 0.194), (24, 0.064), (25, 0.15), (26, 0.083), (27, -0.035), (28, 0.014), (29, -0.09), (30, -0.027), (31, -0.238), (32, -0.185), (33, 0.185), (34, 0.064), (35, -0.038), (36, 0.167), (37, -0.066), (38, 0.247), (39, 0.107), (40, 0.078), (41, -0.167), (42, 0.018), (43, 0.119), (44, 0.145), (45, 0.122), (46, -0.124), (47, -0.037), (48, 0.037), (49, 0.034)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.96007335 23 nips-2000-An Adaptive Metric Machine for Pattern Classification

Author: Carlotta Domeniconi, Jing Peng, Dimitrios Gunopulos

2 0.44872436 19 nips-2000-Adaptive Object Representation with Hierarchically-Distributed Memory Sites

Author: Bosco S. Tjan

3 0.40065464 148 nips-2000-`N-Body' Problems in Statistical Learning

Author: Alexander G. Gray, Andrew W. Moore

Abstract: We present efficient algorithms for all-point-pairs problems , or 'Nbody '-like problems , which are ubiquitous in statistical learning. We focus on six examples, including nearest-neighbor classification, kernel density estimation, outlier detection , and the two-point correlation. These include any problem which abstractly requires a comparison of each of the N points in a dataset with each other point and would naively be solved using N 2 distance computations. In practice N is often large enough to make this infeasible. We present a suite of new geometric t echniques which are applicable in principle to any 'N-body ' computation including large-scale mixtures of Gaussians, RBF neural networks, and HMM 's. Our algorithms exhibit favorable asymptotic scaling and are empirically several orders of magnitude faster than the naive computation, even for small datasets. We are aware of no exact algorithms for these problems which are more efficient either empirically or theoretically. In addition, our framework yields simple and elegant algorithms. It also permits two important generalizations beyond the standard all-point-pairs problems, which are more difficult. These are represented by our final examples, the multiple two-point correlation and the notorious n-point correlation. 1

4 0.35725817 71 nips-2000-Interactive Parts Model: An Application to Recognition of On-line Cursive Script

Author: Predrag Neskovic, Philip C. Davis, Leon N. Cooper

Abstract: In this work, we introduce an Interactive Parts (IP) model as an alternative to Hidden Markov Models (HMMs). We t ested both models on a database of on-line cursive script. We show that implementations of HMMs and the IP model, in which all letters are assumed to have the same average width , give comparable results. However , in contrast to HMMs, the IP model can handle duration modeling without an increase in computational complexity. 1

5 0.32884243 73 nips-2000-Kernel-Based Reinforcement Learning in Average-Cost Problems: An Application to Optimal Portfolio Choice

Author: Dirk Ormoneit, Peter W. Glynn

6 0.32211733 137 nips-2000-The Unscented Particle Filter

7 0.32050353 134 nips-2000-The Kernel Trick for Distances

8 0.26768696 54 nips-2000-Feature Selection for SVMs

9 0.25323054 95 nips-2000-On a Connection between Kernel PCA and Metric Multidimensional Scaling

10 0.24200088 38 nips-2000-Data Clustering by Markovian Relaxation and the Information Bottleneck Method

11 0.24162894 80 nips-2000-Learning Switching Linear Models of Human Motion

12 0.2330797 20 nips-2000-Algebraic Information Geometry for Learning Machines with Singularities

13 0.2270937 1 nips-2000-APRICODD: Approximate Policy Construction Using Decision Diagrams

14 0.212899 17 nips-2000-Active Learning for Parameter Estimation in Bayesian Networks

15 0.21217462 29 nips-2000-Bayes Networks on Ice: Robotic Search for Antarctic Meteorites

16 0.21211666 87 nips-2000-Modelling Spatial Recall, Mental Imagery and Neglect

17 0.20812935 145 nips-2000-Weak Learners and Improved Rates of Convergence in Boosting

18 0.20398645 53 nips-2000-Feature Correspondence: A Markov Chain Monte Carlo Approach

19 0.19789883 84 nips-2000-Minimum Bayes Error Feature Selection for Continuous Speech Recognition

20 0.19676507 7 nips-2000-A New Approximate Maximal Margin Classification Algorithm

similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(10, 0.035), (17, 0.119), (24, 0.346), (32, 0.023), (33, 0.058), (54, 0.024), (55, 0.024), (62, 0.032), (67, 0.051), (75, 0.018), (76, 0.047), (79, 0.025), (81, 0.029), (90, 0.05), (91, 0.011), (97, 0.022)]

similar papers list:

simIndex simValue paperId paperTitle

1 0.79389393 109 nips-2000-Redundancy and Dimensionality Reduction in Sparse-Distributed Representations of Natural Objects in Terms of Their Local Features

Author: Penio S. Penev

Abstract: Low-dimensional representations are key to solving problems in highlevel vision, such as face compression and recognition. Factorial coding strategies for reducing the redundancy present in natural images on the basis of their second-order statistics have been successful in accounting for both psychophysical and neurophysiological properties of early vision. Class-specific representations are presumably formed later, at the higher-level stages of cortical processing. Here we show that when retinotopic factorial codes are derived for ensembles of natural objects, such as human faces, not only redundancy, but also dimensionality is reduced. We also show that objects are built from parts in a non-Gaussian fashion which allows these local-feature codes to have dimensionalities that are substantially lower than the respective Nyquist sampling rates.

same-paper 2 0.79127389 23 nips-2000-An Adaptive Metric Machine for Pattern Classification

Author: Carlotta Domeniconi, Jing Peng, Dimitrios Gunopulos

3 0.61065692 104 nips-2000-Processing of Time Series by Neural Circuits with Biologically Realistic Synaptic Dynamics

Author: Thomas Natschläger, Wolfgang Maass, Eduardo D. Sontag, Anthony M. Zador

Abstract: Experimental data show that biological synapses behave quite differently from the symbolic synapses in common artificial neural network models. Biological synapses are dynamic, i.e., their

4 0.42584753 122 nips-2000-Sparse Representation for Gaussian Process Models

Author: Lehel Csatč´¸, Manfred Opper

Abstract: We develop an approach for a sparse representation for Gaussian Process (GP) models in order to overcome the limitations of GPs caused by large data sets. The method is based on a combination of a Bayesian online algorithm together with a sequential construction of a relevant subsample of the data which fully specifies the prediction of the model. Experimental results on toy examples and large real-world data sets indicate the efficiency of the approach.

5 0.4233095 7 nips-2000-A New Approximate Maximal Margin Classification Algorithm

Author: Claudio Gentile

Abstract: A new incremental learning algorithm is described which approximates the maximal margin hyperplane w.r.t. norm p ~ 2 for a set of linearly separable data. Our algorithm, called ALMAp (Approximate Large Margin algorithm w.r.t. norm p), takes 0 ((P~21;;2) corrections to separate the data with p-norm margin larger than (1 - 0:) ,,(, where,,( is the p-norm margin of the data and X is a bound on the p-norm of the instances. ALMAp avoids quadratic (or higher-order) programming methods. It is very easy to implement and is as fast as on-line algorithms, such as Rosenblatt's perceptron. We report on some experiments comparing ALMAp to two incremental algorithms: Perceptron and Li and Long's ROMMA. Our algorithm seems to perform quite better than both. The accuracy levels achieved by ALMAp are slightly inferior to those obtained by Support vector Machines (SVMs). On the other hand, ALMAp is quite faster and easier to implement than standard SVMs training algorithms.

6 0.4232966 74 nips-2000-Kernel Expansions with Unlabeled Examples

7 0.41752255 4 nips-2000-A Linear Programming Approach to Novelty Detection

8 0.41253698 37 nips-2000-Convergence of Large Margin Separable Linear Classification

9 0.41193292 106 nips-2000-Propagation Algorithms for Variational Bayesian Learning

10 0.41186446 79 nips-2000-Learning Segmentation by Random Walks

11 0.4111203 133 nips-2000-The Kernel Gibbs Sampler

12 0.41090828 21 nips-2000-Algorithmic Stability and Generalization Performance

13 0.40925673 9 nips-2000-A PAC-Bayesian Margin Bound for Linear Classifiers: Why SVMs work

14 0.40899849 107 nips-2000-Rate-coded Restricted Boltzmann Machines for Face Recognition

15 0.40859824 111 nips-2000-Regularized Winnow Methods

16 0.40809444 60 nips-2000-Gaussianization

17 0.40805298 130 nips-2000-Text Classification using String Kernels

18 0.40506727 119 nips-2000-Some New Bounds on the Generalization Error of Combined Classifiers

19 0.40358594 98 nips-2000-Partially Observable SDE Models for Image Sequence Recognition Tasks

20 0.40287888 52 nips-2000-Fast Training of Support Vector Classifiers