nips nips2001 nips2001-104 knowledge-graph by maker-knowledge-mining

104 nips-2001-Kernel Logistic Regression and the Import Vector Machine


Source: pdf

Author: Ji Zhu, Trevor Hastie

Abstract: The support vector machine (SVM) is known for its good performance in binary classification, but its extension to multi-class classification is still an on-going research issue. In this paper, we propose a new approach for classification, called the import vector machine (IVM), which is built on kernel logistic regression (KLR). We show that the IVM not only performs as well as the SVM in binary classification, but also can naturally be generalized to the multi-class case. Furthermore, the IVM provides an estimate of the underlying probability. Similar to the “support points” of the SVM, the IVM model uses only a fraction of the training data to index kernel basis functions, typically a much smaller fraction than the SVM. This gives the IVM a computational advantage over the SVM, especially when the size of the training data set is large.

Reference: text


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 edu Abstract The support vector machine (SVM) is known for its good performance in binary classification, but its extension to multi-class classification is still an on-going research issue. [sent-5, score-0.136]

2 In this paper, we propose a new approach for classification, called the import vector machine (IVM), which is built on kernel logistic regression (KLR). [sent-6, score-0.613]

3 We show that the IVM not only performs as well as the SVM in binary classification, but also can naturally be generalized to the multi-class case. [sent-7, score-0.125]

4 Similar to the “support points” of the SVM, the IVM model uses only a fraction of the training data to index kernel basis functions, typically a much smaller fraction than the SVM. [sent-9, score-0.206]

5 This gives the IVM a computational advantage over the SVM, especially when the size of the training data set is large. [sent-10, score-0.12]

6 1 Introduction   § ¥  ¡    £ § ¥ £ ¡ ©¨¦¤¢  In standard classification problems, we are given a set of training data , , , where the output is qualitative and assumes values in a finite set . [sent-11, score-0.041]

7 We wish to find a classfication rule from the training data, so that when given a new input , we can assign a class from to it. [sent-12, score-0.102]

8 Usually it is assumed that the training data are an independently and identically distributed sample from an unknown probability distribution . [sent-13, score-0.041]

9 ¢  § ¥ ¡ &    '  1 ¥ ) 20  A @ ¥ 7 5 3 9864§ ( The support vector machine (SVM) works well in binary classification, i. [sent-15, score-0.136]

10 Another weakness of the SVM is that it only estimates , while the probability is often of interest itself, where is the conditional probability of a point being in class given . [sent-18, score-0.028]

11 In this paper, we propose a new approach, called the import vector machine (IVM), to address the classification problem. [sent-19, score-0.452]

12 We show that the IVM not only performs as well as the SVM in binary classification, but also can naturally be generalized to the multi-class case. [sent-20, score-0.125]

13 Similar to the “support points” of the SVM, the IVM model uses only a fraction of the training data to index the kernel basis functions. [sent-22, score-0.173]

14 The computational cost of the SVM is , while the computational cost of the IVM is , where is the number of import points. [sent-24, score-0.54]

15 Since does not tend to  ¡ W  U T R @ Q  ¡ V¦S6¢  P  ¡ cb) G`Y¢  X a @ X 1 P IGEB H F D C ( X  ¡ #W  ¡ ed) X P @ i  ¡ ¢   h g 86¢  f i   i V8 g ¢  f P g increase as increases, the IVM can be faster than the SVM, especially for large training data sets. [sent-25, score-0.07]

16 Empirical results show that the number of import points is usually much less than the number of support points. [sent-26, score-0.48]

17 In section (2), we briefly review some results of the SVM for binary classification and compare it with kernel logistic regression (KLR). [sent-27, score-0.204]

18 2 Support vector machines and kernel logistic regression The standard SVM produces a non-linear classification boundary in the original input space by constructing a linear boundary in a transformed version of the original input space. [sent-31, score-0.349]

19 The dimension of the transformed space can be very large, even infinite in some cases. [sent-32, score-0.028]

20 This seemingly prohibitive computation is achieved through a positive definite reproducing kernel , which gives the inner product in the transformed space. [sent-33, score-0.144]

21 Many people have noted the relationship between the SVM and regularized function estimation in the reproducing kernel Hilbert spaces (RKHS). [sent-34, score-0.211]

22 Fitting an SVM is equivalent to minimizing:     ¤ ©§¨ $   ¨ ¦  ¡   ¤ ¥$ 6Q § % &# is the RKHS generated by the kernel  "¥ H F D C IGEB % &$3 # "¥ ! [sent-38, score-0.074]

23 X ¨  ¤ By the representer theorem (Kimeldorf et al (1971)), the optimal ¤ ' $3 U ¤ . [sent-39, score-0.104]

24 @ ¡ @   with classification rule is given by  (1) ¢ £$ g $ It often happens that a sizeable fraction of the values of can be zero. [sent-43, score-0.066]

25 This is a consequence of the truncation property of the first part of criterion (1). [sent-44, score-0.024]

26 This seems to be an attractive property, because only the points on the wrong side of the classification boundary, and those on the right side but near the boundary have an influence in determining the position of the boundary, and hence have non-zero ’s. [sent-45, score-0.12]

27 The loss function is plotted in Figure 1, along with several traditional loss functions. [sent-48, score-0.058]

28 As we can see, the negative log-likelihood (NLL) of the binomial distribution has a similar shape to that of the SVM. [sent-49, score-0.042]

29 If we replace in (1) with , the NLL of the binomial distribution, the problem becomes a KLR problem. [sent-50, score-0.042]

30 We expect that the fitted function performs similarly to the SVM for binary classfication. [sent-51, score-0.065]

31 However, because the KLR compromises the hinge loss function of the SVM, it no longer has the “support points” property; in other words, all the ’s in (2) are non-zero. [sent-53, score-0.029]

32 X  ¡ W  P U T R V¦S@ Q  ¡   P I©EB H F D C  G I4 ¨ B@   R G I4 $ ) KLR is a well studied problem; see Wahba et al. [sent-54, score-0.043]

33 (1995) and references there; see also Green et al. [sent-55, score-0.043]

34 ¨V£ 5 ¡ where is a subset of the training data , and the data in are called import points. [sent-67, score-0.453]

35 The advantage of this sub-model is that the computational cost is reduced, especially for large training data sets, while not jeopardizing the performance in classification. [sent-68, score-0.17]

36    Several other researchers have investigated techniques in selecting the subset . [sent-69, score-0.05]

37 (1998) divide the training data into several clusters, then randomly select a representative from each cluster to make up . [sent-71, score-0.041]

38 (2000) develope a greedy technique to se, such that the span of quentially select columns of the kernel matrix these columns approximates the span of well in the Frobenius norm. [sent-73, score-0.161]

39 (2001) propose randomly selecting points of the training data, then using the Nystrom method to approximate the eigen-decomposition of the kernel matrix , and expanding the results back up to dimensions. [sent-75, score-0.231]

40 None of these methods uses the output in selecting the subset (i. [sent-76, score-0.05]

41 The IVM algorithm uses both the output and the input to select the subset , in such a way that the resulting fit approximates the full model well. [sent-79, score-0.028]

42 $ " g U 8 ¥ $    ¡ ¡ $ ¡ $ %§ $ ¡  H $ G§ 3 Import vector machine A @ ¦¥ S5 7 3 $ G§ Following the tradition of logistic regression, we let for the rest of this paper. [sent-83, score-0.095]

43    4 5   % ¥ V£   ) X 8 9   1 X where tion matrix 1 ) To find , we set the derivative of with respect to equal to 0, and use the NewtonRaphson method to iteratively solve the score equation. [sent-89, score-0.043]

44 It can be shown that the NewtonRaphson step is a weighted least squares step:  ¨ 1 1  Q P £ §   . [sent-90, score-0.021]

45 2 in the th step, ¨ Q P 1 ) £ ¡ ¤¢  is the value of 1 ) @  ) $ ¡ ¨ ! [sent-93, score-0.032]

46 ¢  P WD ) C H 1 where matrix is ) (5)  X ¥ ) As mentioned in section 2, we want to find a subset of , such that the sub-model (3) is a good approximation of the full model (2). [sent-94, score-0.051]

47 Since it is impossible to search for every subset , we use the following greedy forward strategy:    3. [sent-95, score-0.05]

48 ¡ & ( ) 0 ¡ 5 6$X ' ) 9 ' A ¢ 5 ¡ 1 T  & ( X 8 9 We call the points in ) until ¡ ¨¥   ¡ @ ( , ! [sent-109, score-0.05]

49  X  ¡ where the regressor matrix ; the regularization matrix ; . [sent-110, score-0.128]

50 2 Revised algorithm T The above algorithm is computationally feasible, but in step ( ) we need to use the Newton-Raphson method to find iteratively. [sent-116, score-0.021]

51 When the number of import points becomes large, the Newton-Raphson computation can be expensive. [sent-117, score-0.434]

52 i 1  ) 1 £ ¡ ¤¢  Instead of iteratively computing until it converges, we can just do a one-step iteration, and use it as an approximation to the converged one. [sent-119, score-0.02]

53 To get a good approximation, we take advantage of the fitted result from the current “optimal” , i. [sent-120, score-0.022]

54 This one-step update is similar to the score test in generalized linear models (GLM); but the latter does not have a penalty term. [sent-123, score-0.039]

55 The updating formula allows the weighted regression (5) to be computed in time. [sent-124, score-0.059]

56 a a f ) for the basic algorithm:  i T  Hence, we have the revised step ( ' ! [sent-126, score-0.088]

57 3 Stopping rule for adding point to % In step ( ) of the basic algorithm, we need to decide whether has converged. [sent-131, score-0.054]

58 A natural stopping rule is to look at the regularized NLL. [sent-132, score-0.179]

59 Let be the sequence of regularized NLL’s obtained in step ( ). [sent-133, score-0.116]

60 At each step , we compare with , where is a pre-chosen small integer, for example . [sent-134, score-0.021]

61 If the ratio is less than some pre-chosen small number , for example, , we stop adding new import points to . [sent-135, score-0.434]

62 4 Choosing the regularization paramter  So far, we have assumed that the regularization parameter is fixed. [sent-137, score-0.09]

63 We can randomly split all the data into a training set and a tuning set, and use the misclassification error on the tuning set as a criterion for choosing . [sent-139, score-0.16]

64 To reduce the computation, we take advantage of the fact that the regularized NLL converges faster for a larger . [sent-140, score-0.117]

65 Thus, instead of running the entire revised algorithm for each , we propose the following procedure, which combines both adding import points to and choosing the optimal :      , © ¡ ¥    ¥  ¨¥ £ 85 ¡ ¡ X '   T  X . [sent-141, score-0.546]

66 ) Run steps ( ), ( ) and ( ) of the revised algorithm, until the stopping criterion is satisfied at . [sent-142, score-0.142]

67 Along the way, also compute the misclassfication error on the tuning set. [sent-143, score-0.061]

68 8    ¥ £ $ 8eX ¡ 5 ), starting with $ ¥ ¡ 8  ) and ( ¦ )  ) Repeat steps ( A  ) Decrease A  T   )  8   ( , A ( ) Let X ( ) Start with a large regularization parameter . [sent-145, score-0.045]

69  We choose the optimal as the one that corresponds to the minimum misclassification error on the tuning set. [sent-147, score-0.085]

70 4 Simulation In this section, we use a simulation to illustrate the IVM method. [sent-148, score-0.034]

71 The data in each class are generated from a mixture of Gaussians (Hastie et al. [sent-149, score-0.071]

72 1 Remarks The support points of the SVM are those which are close to the classification boundary or misclassified and usually have large weights [ ]. [sent-153, score-0.166]

73 The import points of the IVM are those that decrease the regularized NLL the most, and can be either close to or far from the classification boundary. [sent-154, score-0.529]

74 Though points away from the classification boundary do not contribute to determining the position of the classification boundary, they may contribute to estimating the unknown probability . [sent-156, score-0.12]

75 The total computational cost of the SVM is , while the computational cost of the IVM method is , where is the number of import points. [sent-158, score-0.54]

76 Since does not   ¡ ¢  P Q 2@ W     ¡ U T R V¦V@ P Q  ¡ W  P I©D B H F C i  ¡ ¢   h g 86¢  f i  ¡ ¢  P P   i V8 g ¢  f 240 Misclassification rate for different lambda’s Regularized NLL for the optimal lambda • • • 0. [sent-159, score-0.074]

77 of import points added 0 180 • • • • • • • • • • •• •• ••• •••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••• •• • • •••• • 50 100 150 200 No. [sent-162, score-0.454]

78 of import points added £ )0('§ ¦§&%4§2¡1¡ 3 2 ¥5 $#¦§£ "¥  ¦¥¦¥¤¢  § ¦¥¥  § ©¨ § £ ¡§ ¥ ¡ ¡ Figure 2: Radial kernel is used. [sent-163, score-0.528]

79 rate is found to correspond to stopping criterion is satisfied when • 160 0. [sent-165, score-0.075]

80 30 • g tend to increase as increases, the computational cost of the IVM can be smaller than that of the SVM, especially for large training data sets. [sent-177, score-0.148]

81 We can write the response as an -vector , with each component being either 0 or 1, indicating which class the observation is in. [sent-180, score-0.028]

82 Therefore indicates the response is in the th class, and indicates the response is in the th class. [sent-181, score-0.064]

83 multi-logit can be written as Hence the Bayes classification rule is given by: 7 . [sent-183, score-0.033]

84 ¢  $ ¡ X C  1 2 ¤ ¤  7    $ ¡  E 2 1 "1 ¤ $ § ¥ $ § ¡ Q ¢ H £ £$    X ¥ $ § ¥ $ § % S£   % 1 X $ ¤%§ Using the representer theorem (Kimeldorf et al. [sent-193, score-0.08]

85 (1971)), the th element of which minimizes has the form  %  ! [sent-194, score-0.032]

86 ¤ (8) 6 DC7 A , ¡ @ 6 , § where 1 (7) § We use to index the observations, to index the classes, i. [sent-197, score-0.05]

87 Then the regularized negative log-likelihood is X argmax , SVM - with 107 support points IVM - with 21 import points . [sent-199, score-0.625]

88 For the SVM, the dashed lines are the edges of the margin. [sent-7406, score-0.024]

89 4 ¥    C   $ 4   21     X $ §  ¡ ) C Q H £ ¢ $ E1 , is the th row of   )    1  £  ¥ % X C    ) X 4   ¨ where case; and ! [sent-7411, score-0.032]

90 1 (9) are defined in the same way as in the binary The multi-class IVM procedure is similar to the binary case, and the computational cost is . [sent-7413, score-0.164]

91 The data in each class are generated from a mixture of Gaussians (Hastie et al. [sent-7415, score-0.071]

92 ++++++++++++++++++ + xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx + + + + + + + + + + + + + + + + +. [sent-7438, score-0.126]

93 xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx + + + + + + + + + + + + + + + + + ++++++++++++++ xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx. [sent-7523, score-0.126]

94 o o o o o oo o o oo o o o o o oo o o oo o o o o o o oo o oo oo oo o o o ooo o ooo o o o o oo o o o o o o oo o o o o o o oo oo o o o o o o o oo o o oo o o o o o oo o o o o oo o 0. [sent-11011, score-6.12]

95 g 6   f 6 Conclusion We have discussed the import vector machine (IVM) method in both binary and multi-class classification. [sent-11015, score-0.474]

96 We showed that it not only performs as well as the SVM, but also provides an estimate of the probability . [sent-11016, score-0.022]

97 The computational cost of the IVM is for the binary case and for the multi-class case, where is the number of import points. [sent-11017, score-0.505]

98 (1998) A tutorial on support vector machines for pattern recognition. [sent-11026, score-0.066]

99 (1998), Smoothing spline ANOVA models for large data sets with Bernoulli observations and the randomized GACV. [sent-11060, score-0.067]


similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('ivm', 0.568), ('import', 0.384), ('oo', 0.372), ('xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx', 0.21), ('klr', 0.189), ('svm', 0.184), ('nll', 0.146), ('classi', 0.133), ('wahba', 0.128), ('xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx', 0.126), ('xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx', 0.126), ('cation', 0.113), ('xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx', 0.105), ('hastie', 0.1), ('regularized', 0.095), ('ooo', 0.084), ('kernel', 0.074), ('boundary', 0.07), ('revised', 0.067), ('misclassi', 0.053), ('stopping', 0.051), ('cost', 0.05), ('kimeldorf', 0.05), ('lambda', 0.05), ('points', 0.05), ('logistic', 0.048), ('stanford', 0.048), ('support', 0.046), ('regularization', 0.045), ('tibshirani', 0.044), ('et', 0.043), ('binary', 0.043), ('evgeniou', 0.042), ('ige', 0.042), ('oooo', 0.042), ('xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx', 0.042), ('binomial', 0.042), ('reproducing', 0.042), ('training', 0.041), ('spline', 0.04), ('tted', 0.04), ('generalized', 0.039), ('regression', 0.039), ('regressor', 0.037), ('newtonraphson', 0.037), ('representer', 0.037), ('simulation', 0.034), ('tuning', 0.034), ('madison', 0.033), ('nystrom', 0.033), ('wisconsin', 0.033), ('klein', 0.033), ('fraction', 0.033), ('rule', 0.033), ('th', 0.032), ('rkhs', 0.031), ('zhu', 0.031), ('williams', 0.031), ('bayes', 0.03), ('especially', 0.029), ('trevor', 0.029), ('loss', 0.029), ('subset', 0.028), ('class', 0.028), ('transformed', 0.028), ('computational', 0.028), ('machine', 0.027), ('randomized', 0.027), ('error', 0.027), ('nd', 0.026), ('index', 0.025), ('lines', 0.024), ('criterion', 0.024), ('smola', 0.024), ('brie', 0.024), ('optimal', 0.024), ('nite', 0.024), ('matrix', 0.023), ('hilbert', 0.023), ('performs', 0.022), ('greedy', 0.022), ('selecting', 0.022), ('advantage', 0.022), ('green', 0.022), ('ji', 0.022), ('naturally', 0.021), ('propose', 0.021), ('lin', 0.021), ('step', 0.021), ('span', 0.021), ('lkopf', 0.021), ('vector', 0.02), ('repeat', 0.02), ('formula', 0.02), ('added', 0.02), ('sch', 0.02), ('iteratively', 0.02), ('statistics', 0.019), ('smoothing', 0.019), ('radial', 0.019)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.99999988 104 nips-2001-Kernel Logistic Regression and the Import Vector Machine

Author: Ji Zhu, Trevor Hastie

Abstract: The support vector machine (SVM) is known for its good performance in binary classification, but its extension to multi-class classification is still an on-going research issue. In this paper, we propose a new approach for classification, called the import vector machine (IVM), which is built on kernel logistic regression (KLR). We show that the IVM not only performs as well as the SVM in binary classification, but also can naturally be generalized to the multi-class case. Furthermore, the IVM provides an estimate of the underlying probability. Similar to the “support points” of the SVM, the IVM model uses only a fraction of the training data to index kernel basis functions, typically a much smaller fraction than the SVM. This gives the IVM a computational advantage over the SVM, especially when the size of the training data set is large.

2 0.14451054 63 nips-2001-Dynamic Time-Alignment Kernel in Support Vector Machine

Author: Hiroshi Shimodaira, Ken-ichi Noma, Mitsuru Nakai, Shigeki Sagayama

Abstract: A new class of Support Vector Machine (SVM) that is applicable to sequential-pattern recognition such as speech recognition is developed by incorporating an idea of non-linear time alignment into the kernel function. Since the time-alignment operation of sequential pattern is embedded in the new kernel function, standard SVM training and classification algorithms can be employed without further modifications. The proposed SVM (DTAK-SVM) is evaluated in speaker-dependent speech recognition experiments of hand-segmented phoneme recognition. Preliminary experimental results show comparable recognition performance with hidden Markov models (HMMs). 1

3 0.12243807 46 nips-2001-Categorization by Learning and Combining Object Parts

Author: Bernd Heisele, Thomas Serre, Massimiliano Pontil, Thomas Vetter, Tomaso Poggio

Abstract: We describe an algorithm for automatically learning discriminative components of objects with SVM classifiers. It is based on growing image parts by minimizing theoretical bounds on the error probability of an SVM. Component-based face classifiers are then combined in a second stage to yield a hierarchical SVM classifier. Experimental results in face classification show considerable robustness against rotations in depth and suggest performance at significantly better level than other face detection systems. Novel aspects of our approach are: a) an algorithm to learn component-based classification experts and their combination, b) the use of 3-D morphable models for training, and c) a maximum operation on the output of each component classifier which may be relevant for biological models of visual recognition.

4 0.11549758 139 nips-2001-Online Learning with Kernels

Author: Jyrki Kivinen, Alex J. Smola, Robert C. Williamson

Abstract: We consider online learning in a Reproducing Kernel Hilbert Space. Our method is computationally efficient and leads to simple algorithms. In particular we derive update equations for classification, regression, and novelty detection. The inclusion of the -trick allows us to give a robust parameterization. Moreover, unlike in batch learning where the -trick only applies to the -insensitive loss function we are able to derive general trimmed-mean types of estimators such as for Huber’s robust loss.     ¡

5 0.10975176 29 nips-2001-Adaptive Sparseness Using Jeffreys Prior

Author: Mário Figueiredo

Abstract: In this paper we introduce a new sparseness inducing prior which does not involve any (hyper)parameters that need to be adjusted or estimated. Although other applications are possible, we focus here on supervised learning problems: regression and classification. Experiments with several publicly available benchmark data sets show that the proposed approach yields state-of-the-art performance. In particular, our method outperforms support vector machines and performs competitively with the best alternative techniques, both in terms of error rates and sparseness, although it involves no tuning or adjusting of sparsenesscontrolling hyper-parameters.

6 0.094709076 8 nips-2001-A General Greedy Approximation Algorithm with Applications

7 0.093518905 159 nips-2001-Reducing multiclass to binary by coupling probability estimates

8 0.089900315 20 nips-2001-A Sequence Kernel and its Application to Speaker Recognition

9 0.089260601 38 nips-2001-Asymptotic Universality for Learning Curves of Support Vector Machines

10 0.089014284 105 nips-2001-Kernel Machines and Boolean Functions

11 0.088271104 164 nips-2001-Sampling Techniques for Kernel Methods

12 0.085088432 50 nips-2001-Classifying Single Trial EEG: Towards Brain Computer Interfacing

13 0.075063892 28 nips-2001-Adaptive Nearest Neighbor Classification Using Support Vector Machines

14 0.074305125 60 nips-2001-Discriminative Direction for Kernel Classifiers

15 0.073862657 137 nips-2001-On the Convergence of Leveraging

16 0.073542789 16 nips-2001-A Parallel Mixture of SVMs for Very Large Scale Problems

17 0.072114281 92 nips-2001-Incorporating Invariances in Non-Linear Support Vector Machines

18 0.069261663 77 nips-2001-Fast and Robust Classification using Asymmetric AdaBoost and a Detector Cascade

19 0.069074623 144 nips-2001-Partially labeled classification with Markov random walks

20 0.067270391 62 nips-2001-Duality, Geometry, and Support Vector Regression


similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, -0.188), (1, 0.141), (2, -0.041), (3, 0.127), (4, 0.018), (5, 0.089), (6, 0.025), (7, -0.041), (8, 0.033), (9, 0.025), (10, 0.022), (11, -0.054), (12, -0.032), (13, 0.001), (14, 0.078), (15, 0.001), (16, 0.038), (17, 0.02), (18, 0.04), (19, -0.061), (20, 0.031), (21, -0.055), (22, 0.047), (23, -0.007), (24, 0.025), (25, 0.063), (26, -0.08), (27, 0.016), (28, 0.016), (29, 0.011), (30, 0.046), (31, -0.043), (32, -0.017), (33, 0.051), (34, 0.037), (35, 0.028), (36, 0.012), (37, 0.025), (38, -0.017), (39, -0.058), (40, -0.053), (41, 0.037), (42, -0.041), (43, 0.002), (44, -0.03), (45, 0.039), (46, -0.068), (47, -0.15), (48, -0.021), (49, 0.104)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.93836218 104 nips-2001-Kernel Logistic Regression and the Import Vector Machine

Author: Ji Zhu, Trevor Hastie

Abstract: The support vector machine (SVM) is known for its good performance in binary classification, but its extension to multi-class classification is still an on-going research issue. In this paper, we propose a new approach for classification, called the import vector machine (IVM), which is built on kernel logistic regression (KLR). We show that the IVM not only performs as well as the SVM in binary classification, but also can naturally be generalized to the multi-class case. Furthermore, the IVM provides an estimate of the underlying probability. Similar to the “support points” of the SVM, the IVM model uses only a fraction of the training data to index kernel basis functions, typically a much smaller fraction than the SVM. This gives the IVM a computational advantage over the SVM, especially when the size of the training data set is large.

2 0.71766227 29 nips-2001-Adaptive Sparseness Using Jeffreys Prior

Author: Mário Figueiredo

Abstract: In this paper we introduce a new sparseness inducing prior which does not involve any (hyper)parameters that need to be adjusted or estimated. Although other applications are possible, we focus here on supervised learning problems: regression and classification. Experiments with several publicly available benchmark data sets show that the proposed approach yields state-of-the-art performance. In particular, our method outperforms support vector machines and performs competitively with the best alternative techniques, both in terms of error rates and sparseness, although it involves no tuning or adjusting of sparsenesscontrolling hyper-parameters.

3 0.67279214 63 nips-2001-Dynamic Time-Alignment Kernel in Support Vector Machine

Author: Hiroshi Shimodaira, Ken-ichi Noma, Mitsuru Nakai, Shigeki Sagayama

Abstract: A new class of Support Vector Machine (SVM) that is applicable to sequential-pattern recognition such as speech recognition is developed by incorporating an idea of non-linear time alignment into the kernel function. Since the time-alignment operation of sequential pattern is embedded in the new kernel function, standard SVM training and classification algorithms can be employed without further modifications. The proposed SVM (DTAK-SVM) is evaluated in speaker-dependent speech recognition experiments of hand-segmented phoneme recognition. Preliminary experimental results show comparable recognition performance with hidden Markov models (HMMs). 1

4 0.66741037 20 nips-2001-A Sequence Kernel and its Application to Speaker Recognition

Author: William M. Campbell

Abstract: A novel approach for comparing sequences of observations using an explicit-expansion kernel is demonstrated. The kernel is derived using the assumption of the independence of the sequence of observations and a mean-squared error training criterion. The use of an explicit expansion kernel reduces classifier model size and computation dramatically, resulting in model sizes and computation one-hundred times smaller in our application. The explicit expansion also preserves the computational advantages of an earlier architecture based on mean-squared error training. Training using standard support vector machine methodology gives accuracy that significantly exceeds the performance of state-of-the-art mean-squared error training for a speaker recognition task.

5 0.65761596 60 nips-2001-Discriminative Direction for Kernel Classifiers

Author: Polina Golland

Abstract: In many scientific and engineering applications, detecting and understanding differences between two groups of examples can be reduced to a classical problem of training a classifier for labeling new examples while making as few mistakes as possible. In the traditional classification setting, the resulting classifier is rarely analyzed in terms of the properties of the input data captured by the discriminative model. However, such analysis is crucial if we want to understand and visualize the detected differences. We propose an approach to interpretation of the statistical model in the original feature space that allows us to argue about the model in terms of the relevant changes to the input vectors. For each point in the input space, we define a discriminative direction to be the direction that moves the point towards the other class while introducing as little irrelevant change as possible with respect to the classifier function. We derive the discriminative direction for kernel-based classifiers, demonstrate the technique on several examples and briefly discuss its use in the statistical shape analysis, an application that originally motivated this work.

6 0.56776506 139 nips-2001-Online Learning with Kernels

7 0.55228001 159 nips-2001-Reducing multiclass to binary by coupling probability estimates

8 0.54377127 46 nips-2001-Categorization by Learning and Combining Object Parts

9 0.52966416 50 nips-2001-Classifying Single Trial EEG: Towards Brain Computer Interfacing

10 0.52136642 92 nips-2001-Incorporating Invariances in Non-Linear Support Vector Machines

11 0.50531 105 nips-2001-Kernel Machines and Boolean Functions

12 0.4995141 101 nips-2001-K-Local Hyperplane and Convex Distance Nearest Neighbor Algorithms

13 0.48541799 25 nips-2001-Active Learning in the Drug Discovery Process

14 0.47585106 144 nips-2001-Partially labeled classification with Markov random walks

15 0.4722425 62 nips-2001-Duality, Geometry, and Support Vector Regression

16 0.47071236 77 nips-2001-Fast and Robust Classification using Asymmetric AdaBoost and a Detector Cascade

17 0.45478314 94 nips-2001-Incremental Learning and Selective Sampling via Parametric Optimization Framework for SVM

18 0.45324552 99 nips-2001-Intransitive Likelihood-Ratio Classifiers

19 0.44962299 69 nips-2001-Escaping the Convex Hull with Extrapolated Vector Machines

20 0.44804993 152 nips-2001-Prodding the ROC Curve: Constrained Optimization of Classifier Performance


similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(14, 0.06), (15, 0.314), (17, 0.022), (19, 0.028), (27, 0.111), (30, 0.082), (36, 0.015), (38, 0.015), (59, 0.036), (72, 0.08), (79, 0.022), (83, 0.026), (91, 0.101)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.7682209 104 nips-2001-Kernel Logistic Regression and the Import Vector Machine

Author: Ji Zhu, Trevor Hastie

Abstract: The support vector machine (SVM) is known for its good performance in binary classification, but its extension to multi-class classification is still an on-going research issue. In this paper, we propose a new approach for classification, called the import vector machine (IVM), which is built on kernel logistic regression (KLR). We show that the IVM not only performs as well as the SVM in binary classification, but also can naturally be generalized to the multi-class case. Furthermore, the IVM provides an estimate of the underlying probability. Similar to the “support points” of the SVM, the IVM model uses only a fraction of the training data to index kernel basis functions, typically a much smaller fraction than the SVM. This gives the IVM a computational advantage over the SVM, especially when the size of the training data set is large.

2 0.72386366 30 nips-2001-Agglomerative Multivariate Information Bottleneck

Author: Noam Slonim, Nir Friedman, Naftali Tishby

Abstract: The information bottleneck method is an unsupervised model independent data organization technique. Given a joint distribution peA, B), this method constructs a new variable T that extracts partitions, or clusters, over the values of A that are informative about B. In a recent paper, we introduced a general principled framework for multivariate extensions of the information bottleneck method that allows us to consider multiple systems of data partitions that are inter-related. In this paper, we present a new family of simple agglomerative algorithms to construct such systems of inter-related clusters. We analyze the behavior of these algorithms and apply them to several real-life datasets.

3 0.68916881 143 nips-2001-PAC Generalization Bounds for Co-training

Author: Sanjoy Dasgupta, Michael L. Littman, David A. McAllester

Abstract: The rule-based bootstrapping introduced by Yarowsky, and its cotraining variant by Blum and Mitchell, have met with considerable empirical success. Earlier work on the theory of co-training has been only loosely related to empirically useful co-training algorithms. Here we give a new PAC-style bound on generalization error which justifies both the use of confidences — partial rules and partial labeling of the unlabeled data — and the use of an agreement-based objective function as suggested by Collins and Singer. Our bounds apply to the multiclass case, i.e., where instances are to be assigned one of labels for . £ ¡ ¤¢

4 0.54282856 100 nips-2001-Iterative Double Clustering for Unsupervised and Semi-Supervised Learning

Author: Ran El-Yaniv, Oren Souroujon

Abstract: We present a powerful meta-clustering technique called Iterative Double Clustering (IDC). The IDC method is a natural extension of the recent Double Clustering (DC) method of Slonim and Tishby that exhibited impressive performance on text categorization tasks [12]. Using synthetically generated data we empirically find that whenever the DC procedure is successful in recovering some of the structure hidden in the data, the extended IDC procedure can incrementally compute a significantly more accurate classification. IDC is especially advantageous when the data exhibits high attribute noise. Our simulation results also show the effectiveness of IDC in text categorization problems. Surprisingly, this unsupervised procedure can be competitive with a (supervised) SVM trained with a small training set. Finally, we propose a simple and natural extension of IDC for semi-supervised and transductive learning where we are given both labeled and unlabeled examples. 1

5 0.53091389 63 nips-2001-Dynamic Time-Alignment Kernel in Support Vector Machine

Author: Hiroshi Shimodaira, Ken-ichi Noma, Mitsuru Nakai, Shigeki Sagayama

Abstract: A new class of Support Vector Machine (SVM) that is applicable to sequential-pattern recognition such as speech recognition is developed by incorporating an idea of non-linear time alignment into the kernel function. Since the time-alignment operation of sequential pattern is embedded in the new kernel function, standard SVM training and classification algorithms can be employed without further modifications. The proposed SVM (DTAK-SVM) is evaluated in speaker-dependent speech recognition experiments of hand-segmented phoneme recognition. Preliminary experimental results show comparable recognition performance with hidden Markov models (HMMs). 1

6 0.53058374 29 nips-2001-Adaptive Sparseness Using Jeffreys Prior

7 0.5302977 8 nips-2001-A General Greedy Approximation Algorithm with Applications

8 0.53004092 92 nips-2001-Incorporating Invariances in Non-Linear Support Vector Machines

9 0.5293076 13 nips-2001-A Natural Policy Gradient

10 0.52886832 60 nips-2001-Discriminative Direction for Kernel Classifiers

11 0.52456003 56 nips-2001-Convolution Kernels for Natural Language

12 0.52440953 185 nips-2001-The Method of Quantum Clustering

13 0.52381492 77 nips-2001-Fast and Robust Classification using Asymmetric AdaBoost and a Detector Cascade

14 0.5229848 157 nips-2001-Rates of Convergence of Performance Gradient Estimates Using Function Approximation and Bias in Reinforcement Learning

15 0.52291512 102 nips-2001-KLD-Sampling: Adaptive Particle Filters

16 0.52289093 22 nips-2001-A kernel method for multi-labelled classification

17 0.52166992 131 nips-2001-Neural Implementation of Bayesian Inference in Population Codes

18 0.52009499 28 nips-2001-Adaptive Nearest Neighbor Classification Using Support Vector Machines

19 0.51979321 46 nips-2001-Categorization by Learning and Combining Object Parts

20 0.51976395 138 nips-2001-On the Generalization Ability of On-Line Learning Algorithms