nips nips2001 nips2001-167 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Florence D'alché-buc, Yves Grandvalet, Christophe Ambroise
Abstract: In many discrimination problems a large amount of data is available but only a few of them are labeled. This provides a strong motivation to improve or develop methods for semi-supervised learning. In this paper, boosting is generalized to this task within the optimization framework of MarginBoost . We extend the margin definition to unlabeled data and develop the gradient descent algorithm that corresponds to the resulting margin cost function. This meta-learning scheme can be applied to any base classifier able to benefit from unlabeled data. We propose here to apply it to mixture models trained with an Expectation-Maximization algorithm. Promising results are presented on benchmarks with different rates of labeled data. 1
Reference: text
sentIndex sentText sentNum sentScore
1 fr Abstract In many discrimination problems a large amount of data is available but only a few of them are labeled. [sent-13, score-0.05]
2 In this paper, boosting is generalized to this task within the optimization framework of MarginBoost . [sent-15, score-0.27]
3 We extend the margin definition to unlabeled data and develop the gradient descent algorithm that corresponds to the resulting margin cost function. [sent-16, score-0.917]
4 This meta-learning scheme can be applied to any base classifier able to benefit from unlabeled data. [sent-17, score-0.66]
5 We propose here to apply it to mixture models trained with an Expectation-Maximization algorithm. [sent-18, score-0.068]
6 Promising results are presented on benchmarks with different rates of labeled data. [sent-19, score-0.209]
7 1 Introduction In semi-supervised classification tasks, a concept is to be learnt using both labeled and unlabeled examples. [sent-20, score-0.428]
8 Such problems arise frequently in data-mining where the cost of the labeling process can be prohibitive because it requires human help as in video-indexing, text-categorization [12] and medical diagnosis. [sent-21, score-0.067]
9 While some works proposed different methods [16] to learn mixture models [12], [1], SVM [3], cotrained machines [5] to solve this task, no extension has been developed so far for ensemble methods such as boosting [7, 6]. [sent-22, score-0.406]
10 Boosting consists in building sequentially a linear combination of base classifiers that focus on the difficult examples. [sent-23, score-0.271]
11 For AdaBoost and extensions such as MarginBoost [10], this stage-wise procedure corresponds to a gradient descent of a cost functional based on a decreasing function of the margin, in the space of linear combinations of base classifiers. [sent-24, score-0.464]
12 We propose to generalize boosting to semi-supervised learning within the framework of optimization. [sent-25, score-0.27]
13 We extend the margin notion to unlabeled data, derive the corresponding criterion to be maximized, and propose the resulting algorithm called Semi-Supervised MarginBoost (SSMBoost). [sent-26, score-0.514]
14 This new method enhances our previ- ous work [9] based on a direct plug-in extension of AdaBoost in the sense that all the ingredients of the gradient algorithm such as the gradient direction and the stopping rule are defined from the expression of the new cost function. [sent-27, score-0.398]
15 Moreover, while the algorithm has been tested using the mixtures of models [1], 55MBoost is designed to combine any base classifiers that deals with both labeled and unlabeled data. [sent-28, score-0.744]
16 Then, in section 3, the 55MBoost algorithm is presented. [sent-30, score-0.03]
17 We have chosen the MarginBoost algorithm, a variant of a more general algorithm called Any Boost [10], that generalizes AdaBoost and formally justifies the interpretation in terms of margin. [sent-34, score-0.03]
18 1, minimizes the cost functional C defined for any scalar decreasing function c of the margin p : I C(gt) = L c(p(gt(Xi), Yi))) (2) i=l Instead of taking exactly ht+l = - \1C(gt) which does not ensure that the resulting function gt+! [sent-38, score-0.288]
19 The equivalent weighted cost function to be maximized can thus be expressed as : JF = L Wt(i)Yiht+! [sent-41, score-0.096]
20 (Xi) (3) iES 3 Generalizing MarginBoost to semi-supervised classification 3. [sent-42, score-0.039]
21 1 Margin Extension For labeled data, the margin measures the quality of the classifier output. [sent-43, score-0.429]
22 When no label is observed, the usual margin cannot be calculated and has to be estimated. [sent-44, score-0.22]
23 A first estimation could be derived from the expected margin EypL(gt(X) , y). [sent-45, score-0.188]
24 We can use the output of the classifier (gt(x) + 1)/2 as an estimate of the posterior probability P(Y = +llx). [sent-46, score-0.148]
25 This leads to the following margin pi; which depends on the input and is linked with the response of the classifier: lOr> 0 and L1 norm is used for normalization: 2< f, 9 >= LiE S f(X;)g(Xi) IOrl = L~=l Or Let wo(i) = l/l , i = 1, . [sent-47, score-0.188]
26 Learn a gradient direction htH E 1i with a high value of J{ = L,iEswt(i)YihtH(Xi) 2. [sent-55, score-0.103]
27 Apply the stopping rule: if J{ ::::: L,iES Wt(i)Yigt(Xi) then return gt else go on. [sent-56, score-0.345]
28 Choose a step-length for the obtained direction by a line-search or by fixing it as a constant f 4 . [sent-58, score-0.029]
29 Add the new direction to obtain 9HI = (l a t I9t+ a ttlh t t') lattl l 5. [sent-59, score-0.029]
30 Fix the weight distribution: Wt +1 = c'(p(9ttl(Xi),Yi)) 2: jE S c'(p(9ttl(Xj),Yj)) Figure 1: MarginBoost algorithm (with L1 normalization of the combination coefficients) Another way of defining the extended margin is to use directly the maximum a posteriori estimate of the true margin. [sent-60, score-0.218]
31 This MAP estimate depends on the sign of the classifier output and provides the following margin definition pC; : (5) 3. [sent-61, score-0.336]
32 2 Semi-Supervised MarginBoost : generalization of marginBoost to deal with unlabeled data The generalization of the margin can be used to define an appropriate cost functional for the semi-supervised learning task. [sent-62, score-0.584]
33 with If z E U IWt l c'(PU(9t(Xi))) IWt l This expression of product: IWt I= 2:= Wt (i) (9) iES JP comes directly from differential calculus and the chosen inner ( )() 'VC gt Xi if x = Xi and i E L if x = x, and i E U YiC'(Pd9t(Xi),Yi)) = { c'(p U (g t (x. [sent-69, score-0.345]
34 ))) apU(9t(Xi)) a9t( Xi) t (10) 0 Pu Implementation of 55MBoost with margins pI[; and requires their derivatives. [sent-70, score-0.08]
35 The value of the sub derivative corresponds here to the average value of the right and left derivatives. [sent-73, score-0.033]
36 apUS(gt(Xi)) = {sign(g(Xi)) agt (Xi) 0 if X :f": 0 if x = 0 (11) And, for the "squared margin" Pu 9 , we have: apu 9 (gt(Xi)) = 2g(Xi) agt(Xi) (12) This completes the set of ingredients that must be incorporated into the algorithm of Fig. [sent-74, score-0.181]
37 4 Base Classifier The base classifier should be able to make use of the unlabeled data provided by the boosting algorithm. [sent-76, score-0.93]
38 Hierarchical mixtures provide flexible discrimination tools, where each conditional distribution f(xlY = k) is modelled by a mixture of components [4]. [sent-78, score-0.147]
39 The high-level description can also be expressed as a low-level mixture of components, as shown here for binary classification: Kl f(x;if» = K2 2:= PkJkl(X;Okl) + 2:= Pk2! [sent-81, score-0.068]
40 k2(X;Ok2) (14) With this setting, the EM algorithm is used to maximize the log-likelihood with respect to if> considering the incomplete data is {Xi, Yi}~= l and the missing data is the component label Cik, k = 1, . [sent-82, score-0.146]
41 An original implementation of EM based on the concept of possible labels [1] is considered here. [sent-86, score-0.072]
42 It is well adapted to hierarchical mixtures, where the class label Y provides a subset of possible components. [sent-87, score-0.075]
43 When Y = 1 the first Kl modes are possible, when Y = -1 the last K2 modes are possible, and when an example is unlabeled, all modes are possible. [sent-88, score-0.174]
44 A binary vector Zi E {0,1}(Kl+ K2) indicates the components from which feature vector Xi may have been generated, in agreement with the assumed mixture model and the (absence of) label Yi. [sent-89, score-0.1]
45 Assuming that each mode k follows a normal distribution with mean ILk' and covariance ~k ' q+l = {ILk+! [sent-96, score-0.027]
46 ;Pk+l}f ~iK2 is given by: (17) (18) 5 Experimental results Tests of the algorithm are performed on three benchmarks of the boosting literature: twonorm and ringnorm [6] and banana [13]. [sent-98, score-0.776]
47 Information about these datasets and the results obtained in discrimination are available at www. [sent-99, score-0.05]
48 We first study the behavior of 55MBoost according the evolution of the test error with increasing rates of unlabeled data (table 1). [sent-103, score-0.461]
49 We consider five different settings where 0%, 50%, 75%, 90% and 95% of labels are missing. [sent-104, score-0.072]
50 55MB is tested for the margins P~ and Pu with c(x) = exp( -x). [sent-105, score-0.08]
51 55MBoost and AdaBoost are trained identically, the only difference being that AdaBoost is not provided with missing labels. [sent-107, score-0.084]
52 Both algorithms are run for T = 100 boosting steps, without special care of overfitting. [sent-108, score-0.27]
53 We report mean error rates together with the lower and upper quartiles in table 1. [sent-110, score-0.113]
54 For sake of space, we did not display the results obtained without missing labels: in this case, AdaBoost and 55MBoost behave nearly identically and better than EM only for Banana. [sent-111, score-0.123]
55 Ringnorm 50% 75% 90% 95% base(EM) AdaBoost 55MBoost pS 55MBoost pg Twonorm 2. [sent-113, score-0.056]
56 0] 95% base(EM) AdaBoost 55MBoost pS 55MBoost pg Banana 3. [sent-161, score-0.056]
57 One possible explanation is that the discrimination frontiers involved in the banana problem are so complex that the labels really bring crucial informations and thus adding unlabeled data does not help in such a case. [sent-258, score-0.588]
58 Pu obtains Nevertheless, at rate 95% which is the most realistic situation, the margin the minimal error rate for each of the three problems. [sent-259, score-0.291]
59 It shows that it is worth boosting and using unlabeled data. [sent-260, score-0.595]
60 As there is no great difference between the two proposed margins, we conducted further experiments using only the Pu' Second, in order to study the relation between the presence of noise in the dataset and the ability of 55MBoost to enhance generalization performance, we draw in Fig. [sent-261, score-0.027]
61 2, the test errors obtained for problems with different values of Bayes error when varying the rate of labeled examples. [sent-262, score-0.207]
62 We see that even for difficult tasks (very noisy problems), the degradation in performance for large subsets of unlabeled data is still low. [sent-263, score-0.296]
63 This reflects some consistency in the behavior of our algorithm. [sent-264, score-0.029]
64 Third, we test the sensibility of 55MBoost to overfitting. [sent-265, score-0.038]
65 Overfitting can usually be avoided by techniques such as early stopping, softenizing of the margin ([1 3], [14]) or using an adequate margin function such as 1 - tanh(p) instead of exp( -p) [10]. [sent-266, score-0.376]
66 Here we keep using c = exp and ran 55MBoost with a maximal number of step T = 1000 with 95% of unlabeled data. [sent-267, score-0.296]
67 Of course, this does not correspond to a realistic use of boosting in practice but it allows to check if the algorithm behaves consistently in terms of gradient steps number. [sent-268, score-0.374]
68 It is remarkable that no overfitting is observed and in the Twonorm case (see Fig. [sent-269, score-0.063]
69 We also observe that the standard error deviation is reduced at the end of the process. [sent-271, score-0.049]
70 A massive presence of unlabeled data implies thus a regularizing effect. [sent-275, score-0.338]
71 2% 50 40 20 10 °0 L---~ 0 --~7---~7---- 0--~5~ ,7 20 ~ 4~ 0--~6 0--~7= = 0----8 0----9 0~~ = = '00 Rate of missing labels (%) Figure 2: Consistency of the 55MBoost behavior: evolution of test error versus the missing labels rate with respect to various Bayes error (twonorm ). [sent-279, score-0.516]
72 Mean (Error Test) +/- 1 std 70 70 Mean (Error test) 60' , I~ Mean of Error Test +/- std Mean of Error test I 60 , ~ 50 I i \! [sent-280, score-0.112]
73 6 Conclusion MarginBoost algorithm has been extended to deal with both labeled and unlabeled data. [sent-287, score-0.419]
74 Results obtained on three classical benchmarks of boosting litterature show that it is worth using additional information conveyed by the patterns alone. [sent-288, score-0.378]
75 No overfitting was observed during processing 55MBoost on the benchmarks when 95% of the labels are missing: this should mean that the unlabeled data should playa regularizing role in the ensemble classifier during the boosting process. [sent-289, score-1.038]
76 After applying this method to a large real dataset such as those of text-categorization, our future works on this theme will concern the use of the extended margin cost function on the base classifiers itself like multilayered perceptrons or decision trees. [sent-290, score-0.526]
77 Another approach could also be conducted from the more general framework of AnyBoost that optimize any differential cost function. [sent-291, score-0.121]
78 In Proceedings of th e 1998 Conference on Computational Learning Th eory, July 1998. [sent-321, score-0.036]
79 In Machin e Learning: Proceedings of th e Thirteenth International Conference, pages 148- 156. [sent-331, score-0.036]
80 Text classification from labeled and unlabeled documents using EM. [sent-364, score-0.428]
wordName wordTfidf (topN-words)
[('marginboost', 0.368), ('unlabeled', 0.296), ('gt', 0.293), ('boosting', 0.27), ('xi', 0.229), ('base', 0.216), ('adaboost', 0.215), ('margin', 0.188), ('banana', 0.17), ('classifier', 0.148), ('twonorm', 0.142), ('pu', 0.12), ('wt', 0.116), ('compiegne', 0.113), ('labeled', 0.093), ('pk', 0.09), ('em', 0.087), ('cedex', 0.085), ('cnrs', 0.085), ('grandvalet', 0.085), ('iwt', 0.085), ('ringnorm', 0.085), ('missing', 0.084), ('ok', 0.084), ('margins', 0.08), ('benchmarks', 0.079), ('universite', 0.074), ('descent', 0.074), ('gradient', 0.074), ('labels', 0.072), ('ht', 0.072), ('mixture', 0.068), ('cost', 0.067), ('jf', 0.063), ('overfitting', 0.063), ('modes', 0.058), ('ambroise', 0.057), ('apu', 0.057), ('heudiasyc', 0.057), ('hth', 0.057), ('iel', 0.057), ('lwt', 0.057), ('pus', 0.057), ('riitsch', 0.057), ('technologie', 0.057), ('umr', 0.057), ('zdi', 0.057), ('france', 0.056), ('pg', 0.056), ('classifiers', 0.055), ('ps', 0.054), ('kl', 0.053), ('stopping', 0.052), ('discrimination', 0.05), ('error', 0.049), ('agt', 0.049), ('christophe', 0.049), ('ilk', 0.045), ('ies', 0.045), ('uik', 0.045), ('ingredients', 0.045), ('hierarchical', 0.043), ('onoda', 0.042), ('regularizing', 0.042), ('evolution', 0.041), ('ensemble', 0.041), ('bayes', 0.04), ('identically', 0.039), ('classification', 0.039), ('test', 0.038), ('std', 0.037), ('rates', 0.037), ('th', 0.036), ('pl', 0.036), ('zi', 0.034), ('sub', 0.033), ('functional', 0.033), ('bp', 0.032), ('label', 0.032), ('falls', 0.031), ('july', 0.03), ('algorithm', 0.03), ('mixtures', 0.029), ('worth', 0.029), ('consistency', 0.029), ('maximized', 0.029), ('direction', 0.029), ('freund', 0.028), ('rate', 0.027), ('extension', 0.027), ('conducted', 0.027), ('differential', 0.027), ('mean', 0.027), ('annals', 0.026), ('oo', 0.025), ('jp', 0.025), ('calculus', 0.025), ('deals', 0.025), ('lor', 0.025), ('je', 0.025)]
simIndex simValue paperId paperTitle
same-paper 1 1.0000001 167 nips-2001-Semi-supervised MarginBoost
Author: Florence D'alché-buc, Yves Grandvalet, Christophe Ambroise
Abstract: In many discrimination problems a large amount of data is available but only a few of them are labeled. This provides a strong motivation to improve or develop methods for semi-supervised learning. In this paper, boosting is generalized to this task within the optimization framework of MarginBoost . We extend the margin definition to unlabeled data and develop the gradient descent algorithm that corresponds to the resulting margin cost function. This meta-learning scheme can be applied to any base classifier able to benefit from unlabeled data. We propose here to apply it to mixture models trained with an Expectation-Maximization algorithm. Promising results are presented on benchmarks with different rates of labeled data. 1
2 0.19734672 137 nips-2001-On the Convergence of Leveraging
Author: Gunnar Rätsch, Sebastian Mika, Manfred K. Warmuth
Abstract: We give an unified convergence analysis of ensemble learning methods including e.g. AdaBoost, Logistic Regression and the Least-SquareBoost algorithm for regression. These methods have in common that they iteratively call a base learning algorithm which returns hypotheses that are then linearly combined. We show that these methods are related to the Gauss-Southwell method known from numerical optimization and state non-asymptotical convergence results for all these methods. Our analysis includes -norm regularized cost functions leading to a clean and general way to regularize ensemble learning.
3 0.1821344 144 nips-2001-Partially labeled classification with Markov random walks
Author: Martin Szummer, Tommi Jaakkola
Abstract: To classify a large number of unlabeled examples we combine a limited number of labeled examples with a Markov random walk representation over the unlabeled examples. The random walk representation exploits any low dimensional structure in the data in a robust, probabilistic manner. We develop and compare several estimation criteria/algorithms suited to this representation. This includes in particular multi-way classification with an average margin criterion which permits a closed form solution. The time scale of the random walk regularizes the representation and can be set through a margin-based criterion favoring unambiguous classification. We also extend this basic regularization by adapting time scales for individual examples. We demonstrate the approach on synthetic examples and on text classification problems.
4 0.16726798 45 nips-2001-Boosting and Maximum Likelihood for Exponential Models
Author: Guy Lebanon, John D. Lafferty
Abstract: We derive an equivalence between AdaBoost and the dual of a convex optimization problem, showing that the only difference between minimizing the exponential loss used by AdaBoost and maximum likelihood for exponential models is that the latter requires the model to be normalized to form a conditional probability distribution over labels. In addition to establishing a simple and easily understood connection between the two methods, this framework enables us to derive new regularization procedures for boosting that directly correspond to penalized maximum likelihood. Experiments on UCI datasets support our theoretical analysis and give additional insight into the relationship between boosting and logistic regression.
5 0.15264194 8 nips-2001-A General Greedy Approximation Algorithm with Applications
Author: T. Zhang
Abstract: Greedy approximation algorithms have been frequently used to obtain sparse solutions to learning problems. In this paper, we present a general greedy algorithm for solving a class of convex optimization problems. We derive a bound on the rate of approximation for this algorithm, and show that our algorithm includes a number of earlier studies as special cases.
6 0.14964007 77 nips-2001-Fast and Robust Classification using Asymmetric AdaBoost and a Detector Cascade
8 0.12204999 69 nips-2001-Escaping the Convex Hull with Extrapolated Vector Machines
9 0.12125093 143 nips-2001-PAC Generalization Bounds for Co-training
10 0.10499822 58 nips-2001-Covariance Kernels from Bayesian Generative Models
11 0.10325724 25 nips-2001-Active Learning in the Drug Discovery Process
12 0.084144339 133 nips-2001-On Discriminative vs. Generative Classifiers: A comparison of logistic regression and naive Bayes
13 0.08224842 139 nips-2001-Online Learning with Kernels
14 0.080659457 92 nips-2001-Incorporating Invariances in Non-Linear Support Vector Machines
15 0.079210825 129 nips-2001-Multiplicative Updates for Classification by Mixture Models
16 0.076755986 28 nips-2001-Adaptive Nearest Neighbor Classification Using Support Vector Machines
17 0.074503481 193 nips-2001-Unsupervised Learning of Human Motion Models
18 0.073894076 31 nips-2001-Algorithmic Luckiness
19 0.073488839 27 nips-2001-Activity Driven Adaptive Stochastic Resonance
20 0.067115769 13 nips-2001-A Natural Policy Gradient
topicId topicWeight
[(0, -0.217), (1, 0.117), (2, -0.009), (3, 0.142), (4, 0.082), (5, -0.054), (6, 0.019), (7, -0.052), (8, -0.019), (9, -0.051), (10, 0.122), (11, 0.096), (12, 0.001), (13, 0.014), (14, 0.138), (15, -0.122), (16, -0.013), (17, -0.121), (18, 0.007), (19, 0.195), (20, -0.048), (21, 0.097), (22, -0.341), (23, 0.111), (24, 0.109), (25, -0.002), (26, 0.116), (27, -0.146), (28, -0.055), (29, 0.041), (30, -0.055), (31, 0.043), (32, -0.064), (33, -0.108), (34, -0.153), (35, -0.013), (36, -0.106), (37, -0.036), (38, -0.003), (39, -0.098), (40, 0.014), (41, -0.03), (42, 0.068), (43, 0.011), (44, -0.03), (45, 0.017), (46, 0.069), (47, -0.032), (48, 0.093), (49, -0.058)]
simIndex simValue paperId paperTitle
same-paper 1 0.95841593 167 nips-2001-Semi-supervised MarginBoost
Author: Florence D'alché-buc, Yves Grandvalet, Christophe Ambroise
Abstract: In many discrimination problems a large amount of data is available but only a few of them are labeled. This provides a strong motivation to improve or develop methods for semi-supervised learning. In this paper, boosting is generalized to this task within the optimization framework of MarginBoost . We extend the margin definition to unlabeled data and develop the gradient descent algorithm that corresponds to the resulting margin cost function. This meta-learning scheme can be applied to any base classifier able to benefit from unlabeled data. We propose here to apply it to mixture models trained with an Expectation-Maximization algorithm. Promising results are presented on benchmarks with different rates of labeled data. 1
2 0.5831269 45 nips-2001-Boosting and Maximum Likelihood for Exponential Models
Author: Guy Lebanon, John D. Lafferty
Abstract: We derive an equivalence between AdaBoost and the dual of a convex optimization problem, showing that the only difference between minimizing the exponential loss used by AdaBoost and maximum likelihood for exponential models is that the latter requires the model to be normalized to form a conditional probability distribution over labels. In addition to establishing a simple and easily understood connection between the two methods, this framework enables us to derive new regularization procedures for boosting that directly correspond to penalized maximum likelihood. Experiments on UCI datasets support our theoretical analysis and give additional insight into the relationship between boosting and logistic regression.
3 0.50183755 137 nips-2001-On the Convergence of Leveraging
Author: Gunnar Rätsch, Sebastian Mika, Manfred K. Warmuth
Abstract: We give an unified convergence analysis of ensemble learning methods including e.g. AdaBoost, Logistic Regression and the Least-SquareBoost algorithm for regression. These methods have in common that they iteratively call a base learning algorithm which returns hypotheses that are then linearly combined. We show that these methods are related to the Gauss-Southwell method known from numerical optimization and state non-asymptotical convergence results for all these methods. Our analysis includes -norm regularized cost functions leading to a clean and general way to regularize ensemble learning.
4 0.47931665 117 nips-2001-MIME: Mutual Information Minimization and Entropy Maximization for Bayesian Belief Propagation
Author: Anand Rangarajan, Alan L. Yuille
Abstract: Bayesian belief propagation in graphical models has been recently shown to have very close ties to inference methods based in statistical physics. After Yedidia et al. demonstrated that belief propagation fixed points correspond to extrema of the so-called Bethe free energy, Yuille derived a double loop algorithm that is guaranteed to converge to a local minimum of the Bethe free energy. Yuille’s algorithm is based on a certain decomposition of the Bethe free energy and he mentions that other decompositions are possible and may even be fruitful. In the present work, we begin with the Bethe free energy and show that it has a principled interpretation as pairwise mutual information minimization and marginal entropy maximization (MIME). Next, we construct a family of free energy functions from a spectrum of decompositions of the original Bethe free energy. For each free energy in this family, we develop a new algorithm that is guaranteed to converge to a local minimum. Preliminary computer simulations are in agreement with this theoretical development. 1
5 0.47309396 25 nips-2001-Active Learning in the Drug Discovery Process
Author: Manfred K. Warmuth, Gunnar Rätsch, Michael Mathieson, Jun Liao, Christian Lemmen
Abstract: We investigate the following data mining problem from Computational Chemistry: From a large data set of compounds, find those that bind to a target molecule in as few iterations of biological testing as possible. In each iteration a comparatively small batch of compounds is screened for binding to the target. We apply active learning techniques for selecting the successive batches. One selection strategy picks unlabeled examples closest to the maximum margin hyperplane. Another produces many weight vectors by running perceptrons over multiple permutations of the data. Each weight vector votes with its prediction and we pick the unlabeled examples for which the prediction is most evenly split between and . For a third selection strategy note that each unlabeled example bisects the version space of consistent weight vectors. We estimate the volume on both sides of the split by bouncing a billiard through the version space and select unlabeled examples that cause the most even split of the version space. We demonstrate that on two data sets provided by DuPont Pharmaceuticals that all three selection strategies perform comparably well and are much better than selecting random batches for testing. § © ¨
6 0.45535636 144 nips-2001-Partially labeled classification with Markov random walks
7 0.44762745 69 nips-2001-Escaping the Convex Hull with Extrapolated Vector Machines
8 0.44199741 64 nips-2001-EM-DD: An Improved Multiple-Instance Learning Technique
9 0.42879474 133 nips-2001-On Discriminative vs. Generative Classifiers: A comparison of logistic regression and naive Bayes
10 0.37437981 31 nips-2001-Algorithmic Luckiness
11 0.34586421 92 nips-2001-Incorporating Invariances in Non-Linear Support Vector Machines
12 0.33056003 193 nips-2001-Unsupervised Learning of Human Motion Models
13 0.32994229 77 nips-2001-Fast and Robust Classification using Asymmetric AdaBoost and a Detector Cascade
14 0.32819322 143 nips-2001-PAC Generalization Bounds for Co-training
15 0.29343355 8 nips-2001-A General Greedy Approximation Algorithm with Applications
16 0.28037497 180 nips-2001-The Concave-Convex Procedure (CCCP)
17 0.27542686 129 nips-2001-Multiplicative Updates for Classification by Mixture Models
18 0.27446988 70 nips-2001-Estimating Car Insurance Premia: a Case Study in High-Dimensional Data Inference
19 0.27158663 62 nips-2001-Duality, Geometry, and Support Vector Regression
20 0.27115577 114 nips-2001-Learning from Infinite Data in Finite Time
topicId topicWeight
[(1, 0.32), (14, 0.042), (17, 0.023), (19, 0.028), (27, 0.155), (30, 0.056), (36, 0.012), (38, 0.013), (59, 0.029), (72, 0.064), (79, 0.03), (83, 0.012), (91, 0.143)]
simIndex simValue paperId paperTitle
same-paper 1 0.82802629 167 nips-2001-Semi-supervised MarginBoost
Author: Florence D'alché-buc, Yves Grandvalet, Christophe Ambroise
Abstract: In many discrimination problems a large amount of data is available but only a few of them are labeled. This provides a strong motivation to improve or develop methods for semi-supervised learning. In this paper, boosting is generalized to this task within the optimization framework of MarginBoost . We extend the margin definition to unlabeled data and develop the gradient descent algorithm that corresponds to the resulting margin cost function. This meta-learning scheme can be applied to any base classifier able to benefit from unlabeled data. We propose here to apply it to mixture models trained with an Expectation-Maximization algorithm. Promising results are presented on benchmarks with different rates of labeled data. 1
2 0.58920527 8 nips-2001-A General Greedy Approximation Algorithm with Applications
Author: T. Zhang
Abstract: Greedy approximation algorithms have been frequently used to obtain sparse solutions to learning problems. In this paper, we present a general greedy algorithm for solving a class of convex optimization problems. We derive a bound on the rate of approximation for this algorithm, and show that our algorithm includes a number of earlier studies as special cases.
3 0.58591396 13 nips-2001-A Natural Policy Gradient
Author: Sham M. Kakade
Abstract: We provide a natural gradient method that represents the steepest descent direction based on the underlying structure of the parameter space. Although gradient methods cannot make large changes in the values of the parameters, we show that the natural gradient is moving toward choosing a greedy optimal action rather than just a better action. These greedy optimal actions are those that would be chosen under one improvement step of policy iteration with approximate, compatible value functions, as defined by Sutton et al. [9]. We then show drastic performance improvements in simple MDPs and in the more challenging MDP of Tetris. 1
4 0.5837118 135 nips-2001-On Spectral Clustering: Analysis and an algorithm
Author: Andrew Y. Ng, Michael I. Jordan, Yair Weiss
Abstract: Despite many empirical successes of spectral clustering methodsalgorithms that cluster points using eigenvectors of matrices derived from the data- there are several unresolved issues. First, there are a wide variety of algorithms that use the eigenvectors in slightly different ways. Second, many of these algorithms have no proof that they will actually compute a reasonable clustering. In this paper, we present a simple spectral clustering algorithm that can be implemented using a few lines of Matlab. Using tools from matrix perturbation theory, we analyze the algorithm, and give conditions under which it can be expected to do well. We also show surprisingly good experimental results on a number of challenging clustering problems. 1
5 0.58369505 88 nips-2001-Grouping and dimensionality reduction by locally linear embedding
Author: Marzia Polito, Pietro Perona
Abstract: Locally Linear Embedding (LLE) is an elegant nonlinear dimensionality-reduction technique recently introduced by Roweis and Saul [2]. It fails when the data is divided into separate groups. We study a variant of LLE that can simultaneously group the data and calculate local embedding of each group. An estimate for the upper bound on the intrinsic dimension of the data set is obtained automatically. 1
6 0.58263254 127 nips-2001-Multi Dimensional ICA to Separate Correlated Sources
7 0.57887352 84 nips-2001-Global Coordination of Local Linear Models
9 0.5768953 190 nips-2001-Thin Junction Trees
10 0.57664984 89 nips-2001-Grouping with Bias
11 0.576114 58 nips-2001-Covariance Kernels from Bayesian Generative Models
12 0.57528216 143 nips-2001-PAC Generalization Bounds for Co-training
13 0.57501841 132 nips-2001-Novel iteration schemes for the Cluster Variation Method
14 0.57431078 44 nips-2001-Blind Source Separation via Multinode Sparse Representation
15 0.57381445 197 nips-2001-Why Neuronal Dynamics Should Control Synaptic Learning Rules
16 0.57374537 9 nips-2001-A Generalization of Principal Components Analysis to the Exponential Family
17 0.57355821 131 nips-2001-Neural Implementation of Bayesian Inference in Population Codes
18 0.57286394 57 nips-2001-Correlation Codes in Neuronal Populations
19 0.5720942 95 nips-2001-Infinite Mixtures of Gaussian Process Experts
20 0.57105881 150 nips-2001-Probabilistic Inference of Hand Motion from Neural Activity in Motor Cortex