nips nips2006 nips2006-150 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Corinna Cortes, Mehryar Mohri
Abstract: In many modern large-scale learning applications, the amount of unlabeled data far exceeds that of labeled data. A common instance of this problem is the transductive setting where the unlabeled test points are known to the learning algorithm. This paper presents a study of regression problems in that setting. It presents explicit VC-dimension error bounds for transductive regression that hold for all bounded loss functions and coincide with the tight classification bounds of Vapnik when applied to classification. It also presents a new transductive regression algorithm inspired by our bound that admits a primal and kernelized closedform solution and deals efficiently with large amounts of unlabeled data. The algorithm exploits the position of unlabeled points to locally estimate their labels and then uses a global optimization to ensure robust predictions. Our study also includes the results of experiments with several publicly available regression data sets with up to 20,000 unlabeled examples. The comparison with other transductive regression algorithms shows that it performs well and that it can scale to large data sets.
Reference: text
sentIndex sentText sentNum sentScore
1 edu Abstract In many modern large-scale learning applications, the amount of unlabeled data far exceeds that of labeled data. [sent-4, score-0.364]
2 A common instance of this problem is the transductive setting where the unlabeled test points are known to the learning algorithm. [sent-5, score-0.889]
3 This paper presents a study of regression problems in that setting. [sent-6, score-0.22]
4 It presents explicit VC-dimension error bounds for transductive regression that hold for all bounded loss functions and coincide with the tight classification bounds of Vapnik when applied to classification. [sent-7, score-1.145]
5 It also presents a new transductive regression algorithm inspired by our bound that admits a primal and kernelized closedform solution and deals efficiently with large amounts of unlabeled data. [sent-8, score-1.394]
6 The algorithm exploits the position of unlabeled points to locally estimate their labels and then uses a global optimization to ensure robust predictions. [sent-9, score-0.439]
7 Our study also includes the results of experiments with several publicly available regression data sets with up to 20,000 unlabeled examples. [sent-10, score-0.474]
8 The comparison with other transductive regression algorithms shows that it performs well and that it can scale to large data sets. [sent-11, score-0.715]
9 1 Introduction In many modern large-scale learning applications, the amount of unlabeled data far exceeds that of labeled data. [sent-12, score-0.364]
10 Semi-supervised learning or transductive inference leverage unlabeled data to achieve better predictions and are thus particularly relevant to modern applications. [sent-14, score-0.917]
11 Semi-supervised learning consists of using both labeled and unlabeled data to find a hypothesis that accurately labels unseen examples. [sent-15, score-0.511]
12 Transductive inference uses the same information but only aims at predicting the labels of the known unlabeled examples. [sent-16, score-0.381]
13 This paper deals with regression problems in the transductive setting, which arise in a variety of contexts. [sent-17, score-0.741]
14 The problem of transduction inference was originally formulated and analyzed by Vapnik [1982] who described it as a simpler task than the traditional induction treated in machine learning. [sent-19, score-0.196]
15 A number of recent publications have dealt with the topic of transductive inference [Vapnik, 1998, Joachims, 1999, Bennett and Demiriz, 1998, Chapelle et al. [sent-20, score-0.713]
16 We give new error bounds for transductive regression that hold for all bounded loss functions and coincide with the tight classification bounds of Vapnik [1998] when applied to classification. [sent-32, score-1.049]
17 Our results also include explicit VC-dimension bounds for transductive regression. [sent-33, score-0.69]
18 This contrasts with the original regression bound given by Vapnik [1998] which assumes a specific condition of global regularity on the class of functions and is based on a complicated and implicit function of the samples sizes and the confidence parameter. [sent-34, score-0.265]
19 We also present a new algorithm for transductive regression inspired by our bound which first exploits the position of unlabeled points to locally estimate their labels, and then uses a global optimization to ensure robust predictions. [sent-36, score-1.174]
20 We show that our algorithm admits both a primal and a kernelized closed-form solution. [sent-37, score-0.161]
21 Existing algorithms for the transductive setting require the inversion of a matrix whose dimension is either the total number of unlabeled and labeled examples [Belkin et al. [sent-38, score-1.118]
22 , 2004], or the total number of unlabeled examples [Chapelle et al. [sent-39, score-0.399]
23 This may be prohibitive for many real-world applications with very large amounts of unlabeled examples. [sent-41, score-0.317]
24 Similarly, when the number of training points m is small compared to the number of unlabeled points, using an empirical kernel map, our algorithm requires only constructing and inverting an m × m-matrix. [sent-44, score-0.394]
25 Our study also includes the results of our experiments with several publicly available regression data sets with up to 20,000 unlabeled examples, limited only by the size of the data sets. [sent-45, score-0.474]
26 [1999], which are among the very few algorithms described in the literature dealing specifically with the problem of transductive regression. [sent-48, score-0.602]
27 Section 2 describes in more detail the transductive regression setting we are studying. [sent-51, score-0.715]
28 New generalization error bounds for transductive regression are presented in Section 3. [sent-52, score-0.847]
29 Section 4 describes and analyzes both the primal and dual versions of our algorithm and the experimental results of our study are reported in Section 5. [sent-53, score-0.196]
30 (1) The remaining u unlabeled examples, xm+1 , . [sent-59, score-0.256]
31 1 It differs from the standard (induction) regression estimation problem by the fact that the learning algorithm is given the unlabeled test examples beforehand. [sent-69, score-0.496]
32 In what follows, we consider a hypothesis space H of real-valued functions for regression estimation. [sent-71, score-0.24]
33 For a hypothesis h ∈ H, we denote by R0 (h) its mean squared error on the full sample, by R(h) its error on the training data, and by R(h) the error of h on the test examples: m+u 1 m+u (h(xi )−yi )2 R(h) = 1 m m m+u 1 (h(xi )−yi )2 . [sent-72, score-0.261]
34 u i=m+1 i=1 i=1 (2) For convenience, we will sometimes denote by yx = yi the label of a point x = xi ∈ X . [sent-73, score-0.315]
35 R0 (h) = (h(xi )−yi )2 R(h) = 3 Transductive Regression Generalization Error This section presents explicit generalization error bounds for transductive regression. [sent-74, score-0.793]
36 Vapnik [1998] introduced and analyzed the problem of transduction and presented transductive inference bounds for both classification and regression. [sent-75, score-0.8]
37 His regression bound assumes however a specific regularity condition on the hypothesis functions leading in particular to a surprising bound where no error on the training data implies zero generalization error. [sent-76, score-0.544]
38 Instead, our bounds simply hold for general bounded loss functions and, when applied to classification, coincide with the tight classification bounds of Vapnik [1998]. [sent-82, score-0.304]
39 Our results also include explicit VC-dimension bounds for transductive regression. [sent-83, score-0.69]
40 To the best of our knowledge, these are the first general explicit bounds for transductive regression. [sent-84, score-0.69]
41 For any subset X ′ ⊆ X , any non-negative real number t ≥ 0, and hypothesis h ∈ H, let Θ(h, t, X ′ ) denote the fraction of the points xi ∈ X ′ , i = 1, . [sent-91, score-0.171]
42 Thus, Θ(h, t, X ′ ) represents the error rate over the sample X ′ of the classifier that associates to a point x the value zero if (h(x) − yx )2 ≤ t, one otherwise. [sent-95, score-0.178]
43 ¯ Theorem 1 Let δ > 0, and let ǫ0 > 0 be the minimum value of ǫ such that N (m + u)Γ(ǫ) ≤ δ, and assume that the loss function is bounded: for all h ∈ H and x ∈ X , (h(x) − yx )2 ≤ B 2 , where B ∈ R+ . [sent-98, score-0.172]
44 By definition of R0 and the Lebesgue integral, for all h ∈ H, B2 ∞ 2 R0 (h) = X (h(x) − yx ) D(x) dx = 2 0 Pr [(h(x) − yx ) > t] dt = x∼D 0 Θ(h, t, X ) dt. [sent-106, score-0.336]
45 (14) Theorem 1 provides a general bound on the regression error within the transduction setting. [sent-115, score-0.4]
46 The resulting bound coincides with the tight classification bound given by Vapnik [1998]. [sent-117, score-0.212]
47 The following provides a general and explicit error bound for transduction regression directly expressed in terms of the empirical error, the number of equivalence N (m + u) or the VC-dimension d, and the sample sizes m and u. [sent-119, score-0.548]
48 Assume that the loss function is bounded: for all h ∈ H and x ∈ X , (h(x) − yx )2 ≤ B 2 , where B ∈ R+ . [sent-121, score-0.172]
49 The bound is explicit and can be readily used within the Structural Risk Minimization (SRM) framework, either by using the expression of α in terms of the VC-dimension, or the tighter expression with respect to the number of equivalence classes N . [sent-129, score-0.261]
50 4 Transductive Regression Algorithm This section presents an algorithm for the transductive regression problem. [sent-132, score-0.755]
51 Before presenting this algorithm, let us first emphasize that the algorithms introduced for transductive classification problems, e. [sent-133, score-0.565]
52 , transductive SVMs [Vapnik, 1998, Joachims, 1999], cannot be readily used for regression. [sent-135, score-0.565]
53 These algorithms typically select the hypothesis h, out of a hypothesis space H, that minimizes the following optimization function min ∗ ym+i ,i=1,. [sent-136, score-0.18]
54 ,u Ω(h) + C 1 m m L (h(xi ), yi ) + C ′ i=1 1 u u ∗ L h(xm+i ), ym+i , i=1 (16) where Ω(h) is a capacity measure term, L is the loss function used, C ≥ 0 and C ′ ≥ 0 regularization ∗ ∗ parameters, and where the minimum is taken over all possible labels ym+1 , . [sent-139, score-0.196]
55 In regression, this scheme would lead to a trivial solution not exploiting the transduction setting. [sent-143, score-0.18]
56 Indeed, let h0 be the hypothesis minimizing the first two terms, that is the solution of ∗ the induction problem. [sent-144, score-0.164]
57 The main idea behind the design of our algorithm is to exploit the additional information provided in transduction, that is the position of the unlabeled examples. [sent-151, score-0.256]
58 The first stage is based on the position of unlabeled points. [sent-153, score-0.293]
59 , m+u, a local estimate label yi is determined using the labeled points in the neighborhood of xi . [sent-157, score-0.359]
60 In the ¯ second stage, a global hypothesis h is found that best matches all labels, those of the training data and the estimate labels yi . [sent-158, score-0.332]
61 A global estimate of all labels is needed to make predictions less vulnerable to noise. [sent-161, score-0.151]
62 This defines the neighborhood of the image of each unlabeled point. [sent-165, score-0.321]
63 Labeled points x ∈ Xm whose images Φ(x) fall within the neighborhood of Φ(x′ ), x′ ∈ Xu , help determine an estimate label of x′ . [sent-167, score-0.16]
64 With a very large radius r, the labels of all training examples contribute to the definition of the local estimates. [sent-168, score-0.249]
65 When no such labeled point exists in the neighborhood of x′ ∈ Xu , which depends on the radius r selected, x′ is disregarded in both training stages of the algorithm. [sent-170, score-0.283]
66 One simple way consists of defining it as the weighted average of the neighborhood labels yx , where the weights may be defined as the inverse of distances of Φ(x) to Φ(x′ ), or as similarity measures K(x, x′ ) when a positive definite kernel K is associated to Φ. [sent-172, score-0.35]
67 Thus, when the set of labeled points with images in the neighborhood of Φ(x′ ) is not empty, I = {i ∈ [1, m] : Φ(xi ) ∈ B(Φ(x′ ), r)} = ∅, the estimate label yx′ of x′ ∈ Xu can be given by: ¯ yx′ = ¯ i∈I wi yi ¯ i wi with −1 wi = Φ(x′ ) − Φ(xi ) ≤ r or wi = K(x′ , xi ). [sent-173, score-0.487]
68 (17) The estimate labels can also be obtained as the solution of a local linear or kernel ridge regression, which is what we used in most of our experiments. [sent-174, score-0.272]
69 In practice, with a relatively small radius r, the computation of an estimated label yi depends only ¯ on a limited number of labeled points and their labels, and is quite efficient. [sent-175, score-0.275]
70 2 Global Optimization The second stage of our algorithm consists of selecting a hypothesis h that fits best the labels of the training points and the estimate labels provided in the first stage. [sent-177, score-0.416]
71 As suggested by Corollary 1, hypothesis spaces with a smaller number of equivalence classes guarantee a better generalization error. [sent-178, score-0.19]
72 This leads us to consider the following objective function m 2 G = ||w|| + C m+u 2 i=1 (h(xi ) − yi ) + C ′ (h(xi ) − yi )2 , ¯ (18) i=m+1 where h is as a linear function with weight vector w ∈ F : ∀x ∈ X , h(x) = w · Φ(x), and where C ≥ 0 and C ′ ≥ 0 are regularization parameters. [sent-180, score-0.154]
73 The third term, which restricts the estimate error, can be viewed as imposing a smaller number of equivalence classes on the hypothesis space as suggested by the error bound of Corollary 1. [sent-182, score-0.299]
74 The constraint explicitly exploits knowledge about the location of all the test points, and limits the range of the hypothesis at these locations, thereby reducing the number of equivalence classes. [sent-183, score-0.227]
75 Our algorithm can be viewed as a generalization of (kernel) ridge regression to the transductive setting. [sent-184, score-0.811]
76 ′ ′ (21) N ×N This gives a closed-form solution in the primal space based on the inversion of a matrix in R . [sent-200, score-0.197]
77 , 2004], or even by the regression technique described by [Chapelle et al. [sent-214, score-0.242]
78 , 1999], despite it is based on a primal method (we have derived a dual version of that method as well, see Section 5) since it requires among other things the inversion of a matrix in Ru×u . [sent-215, score-0.224]
79 points 25 500 2,500 5,000 20,000 2,500 8,000 500 2500 15,000 Relative improvement in MSE (%) Our algorithm Chapelle et al. [sent-228, score-0.151]
80 The number of unlabeled examples was u = 25 for the Boston Housing data set and varied from u = 500 to the maximum of 20,000 examples for the California Housing data set. [sent-285, score-0.358]
81 As already pointed in the description of the local estimates, in practice, some unlabeled points are disregarded in the training phases because no labeled point falls in their neighborhood. [sent-292, score-0.442]
82 Thus, instead of u, a smaller number of unlabeled examples u′ ≤ u determines the computational cost. [sent-293, score-0.307]
83 5 Experimental Results This section reports the results of our experiments with the transductive regression algorithm just presented with several data sets. [sent-294, score-0.715]
84 [2004], which are among the very few algorithms described in the literature dealing specifically with the problem of transductive regression. [sent-297, score-0.602]
85 (27) Our comparisons were made using several publicly available regression data sets: Boston Housing, kin-32fh a data set in the Kinematics family with high unpredictability or noise, California Housing, and Elevators [Torgo, 2006]. [sent-301, score-0.188]
86 For the Boston Housing data set, we used the same partitioning of the training and test sets as in [Chapelle et al. [sent-302, score-0.173]
87 For the kin-32fh, California Housing, and Elevators data sets, 25 training examples were used with varying (large) amounts of test examples: 2,500 and 8,000 for kin-32fh; from 500 up to 20,000 for California Housing; and from 500 to 15,000 for Elevators. [sent-305, score-0.169]
88 To measure the improvement produced by the transductive inference algorithms, we used kernel ridge regression as a baseline. [sent-308, score-0.88]
89 Alternatively, the parameters could be selected using the explicit VC-dimension generalization bound of Corollary 1. [sent-315, score-0.173]
90 For our algorithm, we experimented both with the dual solution using Gaussian kernels, and the primal solution with an empirical Gaussian kernel map as described in Section 4. [sent-319, score-0.321]
91 The results obtained were very similar, however the primal method was dramatically faster since it required the inversion of relatively small-dimensional matrices even for a large number of unlabeled examples. [sent-322, score-0.409]
92 We note that, while positive classification results have been previously reported for this algorithm, no transductive regression experimental result seems to have been published for it. [sent-331, score-0.715]
93 Our algorithm achieved a significant improvement of the MSE in all data sets and for different amounts of unlabeled data and was shown to be practical for large data sets of 20,000 test examples. [sent-334, score-0.362]
94 This matches many real-world situations where amount of unlabeled data is orders of magnitude larger than that of labeled data. [sent-335, score-0.326]
95 6 Conclusion We presented a general study of transductive regression. [sent-336, score-0.595]
96 We gave new and general explicit error bounds for transductive regression and described a simple and general algorithm inspired by our bound that can scale to relatively large data sets. [sent-337, score-0.985]
97 The results of experiments show that our algorithm achieves a smaller error in several tasks compared to other previously published algorithms for transductive regression. [sent-338, score-0.595]
98 The problem of transductive regression arises in a variety of learning contexts, in particular for learning node labels of a very large graphs such as the web graph. [sent-339, score-0.81]
99 We hope that our study will be useful for dealing with these and other similar transduction regression problems. [sent-341, score-0.353]
100 Learning from labeled and unlabeled data on a directed graph. [sent-404, score-0.326]
wordName wordTfidf (topN-words)
[('transductive', 0.565), ('unlabeled', 0.256), ('vapnik', 0.24), ('housing', 0.194), ('mx', 0.194), ('xm', 0.189), ('mohri', 0.156), ('regression', 0.15), ('yx', 0.148), ('chapelle', 0.142), ('belkin', 0.139), ('transduction', 0.136), ('cortes', 0.124), ('xu', 0.115), ('labels', 0.095), ('ym', 0.095), ('primal', 0.095), ('ib', 0.093), ('et', 0.092), ('hypothesis', 0.09), ('ru', 0.088), ('bound', 0.084), ('im', 0.081), ('yi', 0.077), ('boston', 0.073), ('dual', 0.071), ('labeled', 0.07), ('bounds', 0.069), ('elevators', 0.067), ('equivalence', 0.067), ('coincide', 0.066), ('neighborhood', 0.065), ('corollary', 0.064), ('ridge', 0.063), ('radius', 0.061), ('corinna', 0.058), ('inversion', 0.058), ('mse', 0.057), ('explicit', 0.056), ('vladimir', 0.053), ('mu', 0.053), ('xi', 0.052), ('examples', 0.051), ('derbeko', 0.045), ('disregarded', 0.045), ('mehryar', 0.045), ('southey', 0.045), ('classi', 0.044), ('solution', 0.044), ('tight', 0.044), ('training', 0.042), ('kernel', 0.042), ('schuurmans', 0.041), ('presents', 0.04), ('dt', 0.04), ('test', 0.039), ('kk', 0.039), ('modern', 0.038), ('publicly', 0.038), ('label', 0.038), ('amounts', 0.037), ('dealing', 0.037), ('stage', 0.037), ('bennett', 0.035), ('corduneanu', 0.035), ('california', 0.035), ('admits', 0.035), ('courant', 0.033), ('generalization', 0.033), ('wi', 0.032), ('bounded', 0.032), ('exploits', 0.031), ('graepel', 0.031), ('kernelized', 0.031), ('regularity', 0.031), ('inspired', 0.031), ('rn', 0.031), ('induction', 0.03), ('study', 0.03), ('inference', 0.03), ('improvement', 0.03), ('error', 0.03), ('points', 0.029), ('gram', 0.028), ('estimate', 0.028), ('predictions', 0.028), ('expression', 0.027), ('xx', 0.027), ('joachims', 0.027), ('lanckriet', 0.027), ('theorem', 0.027), ('rewritten', 0.026), ('dealt', 0.026), ('deals', 0.026), ('dimension', 0.026), ('integers', 0.025), ('tk', 0.025), ('empirical', 0.025), ('prohibitive', 0.024), ('loss', 0.024)]
simIndex simValue paperId paperTitle
same-paper 1 0.99999994 150 nips-2006-On Transductive Regression
Author: Corinna Cortes, Mehryar Mohri
Abstract: In many modern large-scale learning applications, the amount of unlabeled data far exceeds that of labeled data. A common instance of this problem is the transductive setting where the unlabeled test points are known to the learning algorithm. This paper presents a study of regression problems in that setting. It presents explicit VC-dimension error bounds for transductive regression that hold for all bounded loss functions and coincide with the tight classification bounds of Vapnik when applied to classification. It also presents a new transductive regression algorithm inspired by our bound that admits a primal and kernelized closedform solution and deals efficiently with large amounts of unlabeled data. The algorithm exploits the position of unlabeled points to locally estimate their labels and then uses a global optimization to ensure robust predictions. Our study also includes the results of experiments with several publicly available regression data sets with up to 20,000 unlabeled examples. The comparison with other transductive regression algorithms shows that it performs well and that it can scale to large data sets.
2 0.21566054 48 nips-2006-Branch and Bound for Semi-Supervised Support Vector Machines
Author: Olivier Chapelle, Vikas Sindhwani, S. S. Keerthi
Abstract: Semi-supervised SVMs (S3 VM) attempt to learn low-density separators by maximizing the margin over labeled and unlabeled examples. The associated optimization problem is non-convex. To examine the full potential of S3 VMs modulo local minima problems in current implementations, we apply branch and bound techniques for obtaining exact, globally optimal solutions. Empirical evidence suggests that the globally optimal solution can return excellent generalization performance in situations where other implementations fail completely. While our current implementation is only applicable to small datasets, we discuss variants that can potentially lead to practically useful algorithms. 1
3 0.15397747 104 nips-2006-Large-Scale Sparsified Manifold Regularization
Author: Ivor W. Tsang, James T. Kwok
Abstract: Semi-supervised learning is more powerful than supervised learning by using both labeled and unlabeled data. In particular, the manifold regularization framework, together with kernel methods, leads to the Laplacian SVM (LapSVM) that has demonstrated state-of-the-art performance. However, the LapSVM solution typically involves kernel expansions of all the labeled and unlabeled examples, and is slow on testing. Moreover, existing semi-supervised learning methods, including the LapSVM, can only handle a small number of unlabeled examples. In this paper, we integrate manifold regularization with the core vector machine, which has been used for large-scale supervised and unsupervised learning. By using a sparsified manifold regularizer and formulating as a center-constrained minimum enclosing ball problem, the proposed method produces sparse solutions with low time and space complexities. Experimental results show that it is much faster than the LapSVM, and can handle a million unlabeled examples on a standard PC; while the LapSVM can only handle several thousand patterns. 1
4 0.11231327 118 nips-2006-Learning to Model Spatial Dependency: Semi-Supervised Discriminative Random Fields
Author: Chi-hoon Lee, Shaojun Wang, Feng Jiao, Dale Schuurmans, Russell Greiner
Abstract: We present a novel, semi-supervised approach to training discriminative random fields (DRFs) that efficiently exploits labeled and unlabeled training data to achieve improved accuracy in a variety of image processing tasks. We formulate DRF training as a form of MAP estimation that combines conditional loglikelihood on labeled data, given a data-dependent prior, with a conditional entropy regularizer defined on unlabeled data. Although the training objective is no longer concave, we develop an efficient local optimization procedure that produces classifiers that are more accurate than ones based on standard supervised DRF training. We then apply our semi-supervised approach to train DRFs to segment both synthetic and real data sets, and demonstrate significant improvements over supervised DRFs in each case.
5 0.097916864 84 nips-2006-Generalized Regularized Least-Squares Learning with Predefined Features in a Hilbert Space
Author: Wenye Li, Kin-hong Lee, Kwong-sak Leung
Abstract: Kernel-based regularized learning seeks a model in a hypothesis space by minimizing the empirical error and the model’s complexity. Based on the representer theorem, the solution consists of a linear combination of translates of a kernel. This paper investigates a generalized form of representer theorem for kernel-based learning. After mapping predefined features and translates of a kernel simultaneously onto a hypothesis space by a specific way of constructing kernels, we proposed a new algorithm by utilizing a generalized regularizer which leaves part of the space unregularized. Using a squared-loss function in calculating the empirical error, a simple convex solution is obtained which combines predefined features with translates of the kernel. Empirical evaluations have confirmed the effectiveness of the algorithm for supervised learning tasks.
6 0.090666115 65 nips-2006-Denoising and Dimension Reduction in Feature Space
7 0.088258766 33 nips-2006-Analysis of Representations for Domain Adaptation
8 0.083047561 186 nips-2006-Support Vector Machines on a Budget
9 0.081956975 62 nips-2006-Correcting Sample Selection Bias by Unlabeled Data
10 0.08046826 116 nips-2006-Learning from Multiple Sources
11 0.077331074 93 nips-2006-Hyperparameter Learning for Graph Based Semi-supervised Learning Algorithms
12 0.076788515 169 nips-2006-Relational Learning with Gaussian Processes
13 0.072988495 51 nips-2006-Clustering Under Prior Knowledge with Application to Image Segmentation
14 0.071451619 163 nips-2006-Prediction on a Graph with a Perceptron
15 0.071328931 117 nips-2006-Learning on Graph with Laplacian Regularization
16 0.069801778 193 nips-2006-Tighter PAC-Bayes Bounds
17 0.06956058 68 nips-2006-Dirichlet-Enhanced Spam Filtering based on Biased Samples
18 0.069422282 151 nips-2006-On the Relation Between Low Density Separation, Spectral Clustering and Graph Cuts
19 0.06933409 123 nips-2006-Learning with Hypergraphs: Clustering, Classification, and Embedding
20 0.067869887 61 nips-2006-Convex Repeated Games and Fenchel Duality
topicId topicWeight
[(0, -0.223), (1, 0.092), (2, -0.051), (3, 0.049), (4, -0.031), (5, 0.198), (6, -0.119), (7, 0.052), (8, 0.028), (9, -0.017), (10, -0.077), (11, -0.171), (12, -0.017), (13, 0.001), (14, 0.032), (15, 0.041), (16, 0.105), (17, -0.094), (18, -0.102), (19, -0.007), (20, -0.085), (21, -0.186), (22, 0.035), (23, 0.111), (24, 0.023), (25, 0.105), (26, 0.066), (27, 0.065), (28, -0.065), (29, -0.038), (30, -0.077), (31, -0.03), (32, -0.009), (33, 0.019), (34, -0.004), (35, 0.065), (36, -0.059), (37, -0.012), (38, -0.034), (39, 0.076), (40, -0.062), (41, -0.096), (42, -0.007), (43, -0.054), (44, -0.045), (45, -0.139), (46, 0.06), (47, -0.013), (48, -0.022), (49, -0.152)]
simIndex simValue paperId paperTitle
same-paper 1 0.92405576 150 nips-2006-On Transductive Regression
Author: Corinna Cortes, Mehryar Mohri
Abstract: In many modern large-scale learning applications, the amount of unlabeled data far exceeds that of labeled data. A common instance of this problem is the transductive setting where the unlabeled test points are known to the learning algorithm. This paper presents a study of regression problems in that setting. It presents explicit VC-dimension error bounds for transductive regression that hold for all bounded loss functions and coincide with the tight classification bounds of Vapnik when applied to classification. It also presents a new transductive regression algorithm inspired by our bound that admits a primal and kernelized closedform solution and deals efficiently with large amounts of unlabeled data. The algorithm exploits the position of unlabeled points to locally estimate their labels and then uses a global optimization to ensure robust predictions. Our study also includes the results of experiments with several publicly available regression data sets with up to 20,000 unlabeled examples. The comparison with other transductive regression algorithms shows that it performs well and that it can scale to large data sets.
2 0.87654459 48 nips-2006-Branch and Bound for Semi-Supervised Support Vector Machines
Author: Olivier Chapelle, Vikas Sindhwani, S. S. Keerthi
Abstract: Semi-supervised SVMs (S3 VM) attempt to learn low-density separators by maximizing the margin over labeled and unlabeled examples. The associated optimization problem is non-convex. To examine the full potential of S3 VMs modulo local minima problems in current implementations, we apply branch and bound techniques for obtaining exact, globally optimal solutions. Empirical evidence suggests that the globally optimal solution can return excellent generalization performance in situations where other implementations fail completely. While our current implementation is only applicable to small datasets, we discuss variants that can potentially lead to practically useful algorithms. 1
3 0.81705719 104 nips-2006-Large-Scale Sparsified Manifold Regularization
Author: Ivor W. Tsang, James T. Kwok
Abstract: Semi-supervised learning is more powerful than supervised learning by using both labeled and unlabeled data. In particular, the manifold regularization framework, together with kernel methods, leads to the Laplacian SVM (LapSVM) that has demonstrated state-of-the-art performance. However, the LapSVM solution typically involves kernel expansions of all the labeled and unlabeled examples, and is slow on testing. Moreover, existing semi-supervised learning methods, including the LapSVM, can only handle a small number of unlabeled examples. In this paper, we integrate manifold regularization with the core vector machine, which has been used for large-scale supervised and unsupervised learning. By using a sparsified manifold regularizer and formulating as a center-constrained minimum enclosing ball problem, the proposed method produces sparse solutions with low time and space complexities. Experimental results show that it is much faster than the LapSVM, and can handle a million unlabeled examples on a standard PC; while the LapSVM can only handle several thousand patterns. 1
4 0.64696246 118 nips-2006-Learning to Model Spatial Dependency: Semi-Supervised Discriminative Random Fields
Author: Chi-hoon Lee, Shaojun Wang, Feng Jiao, Dale Schuurmans, Russell Greiner
Abstract: We present a novel, semi-supervised approach to training discriminative random fields (DRFs) that efficiently exploits labeled and unlabeled training data to achieve improved accuracy in a variety of image processing tasks. We formulate DRF training as a form of MAP estimation that combines conditional loglikelihood on labeled data, given a data-dependent prior, with a conditional entropy regularizer defined on unlabeled data. Although the training objective is no longer concave, we develop an efficient local optimization procedure that produces classifiers that are more accurate than ones based on standard supervised DRF training. We then apply our semi-supervised approach to train DRFs to segment both synthetic and real data sets, and demonstrate significant improvements over supervised DRFs in each case.
5 0.58655906 93 nips-2006-Hyperparameter Learning for Graph Based Semi-supervised Learning Algorithms
Author: Xinhua Zhang, Wee S. Lee
Abstract: Semi-supervised learning algorithms have been successfully applied in many applications with scarce labeled data, by utilizing the unlabeled data. One important category is graph based semi-supervised learning algorithms, for which the performance depends considerably on the quality of the graph, or its hyperparameters. In this paper, we deal with the less explored problem of learning the graphs. We propose a graph learning method for the harmonic energy minimization method; this is done by minimizing the leave-one-out prediction error on labeled data points. We use a gradient based method and designed an efficient algorithm which significantly accelerates the calculation of the gradient by applying the matrix inversion lemma and using careful pre-computation. Experimental results show that the graph learning method is effective in improving the performance of the classification algorithm. 1
6 0.54215747 136 nips-2006-Multi-Instance Multi-Label Learning with Application to Scene Classification
7 0.47772756 101 nips-2006-Isotonic Conditional Random Fields and Local Sentiment Flow
8 0.45954692 68 nips-2006-Dirichlet-Enhanced Spam Filtering based on Biased Samples
9 0.44491881 129 nips-2006-Map-Reduce for Machine Learning on Multicore
10 0.42310333 33 nips-2006-Analysis of Representations for Domain Adaptation
11 0.42224944 186 nips-2006-Support Vector Machines on a Budget
12 0.41429082 62 nips-2006-Correcting Sample Selection Bias by Unlabeled Data
13 0.4128325 193 nips-2006-Tighter PAC-Bayes Bounds
14 0.40680808 28 nips-2006-An Efficient Method for Gradient-Based Adaptation of Hyperparameters in SVM Models
15 0.39827082 84 nips-2006-Generalized Regularized Least-Squares Learning with Predefined Features in a Hilbert Space
16 0.39147067 156 nips-2006-Ordinal Regression by Extended Binary Classification
17 0.38907111 169 nips-2006-Relational Learning with Gaussian Processes
18 0.385892 5 nips-2006-A Kernel Method for the Two-Sample-Problem
19 0.36369333 109 nips-2006-Learnability and the doubling dimension
20 0.35839459 82 nips-2006-Gaussian and Wishart Hyperkernels
topicId topicWeight
[(1, 0.119), (3, 0.026), (7, 0.088), (9, 0.035), (22, 0.092), (44, 0.07), (57, 0.049), (64, 0.325), (65, 0.071), (69, 0.029)]
simIndex simValue paperId paperTitle
1 0.84479505 14 nips-2006-A Small World Threshold for Economic Network Formation
Author: Eyal Even-dar, Michael Kearns
Abstract: We introduce a game-theoretic model for network formation inspired by earlier stochastic models that mix localized and long-distance connectivity. In this model, players may purchase edges at distance d at a cost of dα , and wish to minimize the sum of their edge purchases and their average distance to other players. In this model, we show there is a striking “small world” threshold phenomenon: in two dimensions, if α < 2 then every Nash equilibrium results in a network of constant diameter (independent of network size), and if α > 2 then every Nash equilibrium results in a network whose diameter grows as a root of the network size, and thus is unbounded. We contrast our results with those of Kleinberg [8] in a stochastic model, and empirically investigate the “navigability” of equilibrium networks. Our theoretical results all generalize to higher dimensions. 1
same-paper 2 0.77916592 150 nips-2006-On Transductive Regression
Author: Corinna Cortes, Mehryar Mohri
Abstract: In many modern large-scale learning applications, the amount of unlabeled data far exceeds that of labeled data. A common instance of this problem is the transductive setting where the unlabeled test points are known to the learning algorithm. This paper presents a study of regression problems in that setting. It presents explicit VC-dimension error bounds for transductive regression that hold for all bounded loss functions and coincide with the tight classification bounds of Vapnik when applied to classification. It also presents a new transductive regression algorithm inspired by our bound that admits a primal and kernelized closedform solution and deals efficiently with large amounts of unlabeled data. The algorithm exploits the position of unlabeled points to locally estimate their labels and then uses a global optimization to ensure robust predictions. Our study also includes the results of experiments with several publicly available regression data sets with up to 20,000 unlabeled examples. The comparison with other transductive regression algorithms shows that it performs well and that it can scale to large data sets.
3 0.74379236 75 nips-2006-Efficient sparse coding algorithms
Author: Honglak Lee, Alexis Battle, Rajat Raina, Andrew Y. Ng
Abstract: Sparse coding provides a class of algorithms for finding succinct representations of stimuli; given only unlabeled input data, it discovers basis functions that capture higher-level features in the data. However, finding sparse codes remains a very difficult computational problem. In this paper, we present efficient sparse coding algorithms that are based on iteratively solving two convex optimization problems: an L1 -regularized least squares problem and an L2 -constrained least squares problem. We propose novel algorithms to solve both of these optimization problems. Our algorithms result in a significant speedup for sparse coding, allowing us to learn larger sparse codes than possible with previously described algorithms. We apply these algorithms to natural images and demonstrate that the inferred sparse codes exhibit end-stopping and non-classical receptive field surround suppression and, therefore, may provide a partial explanation for these two phenomena in V1 neurons. 1
4 0.55558032 163 nips-2006-Prediction on a Graph with a Perceptron
Author: Mark Herbster, Massimiliano Pontil
Abstract: We study the problem of online prediction of a noisy labeling of a graph with the perceptron. We address both label noise and concept noise. Graph learning is framed as an instance of prediction on a finite set. To treat label noise we show that the hinge loss bounds derived by Gentile [1] for online perceptron learning can be transformed to relative mistake bounds with an optimal leading constant when applied to prediction on a finite set. These bounds depend crucially on the norm of the learned concept. Often the norm of a concept can vary dramatically with only small perturbations in a labeling. We analyze a simple transformation that stabilizes the norm under perturbations. We derive an upper bound that depends only on natural properties of the graph – the graph diameter and the cut size of a partitioning of the graph – which are only indirectly dependent on the size of the graph. The impossibility of such bounds for the graph geodesic nearest neighbors algorithm will be demonstrated. 1
5 0.53563344 160 nips-2006-Part-based Probabilistic Point Matching using Equivalence Constraints
Author: Graham Mcneill, Sethu Vijayakumar
Abstract: Correspondence algorithms typically struggle with shapes that display part-based variation. We present a probabilistic approach that matches shapes using independent part transformations, where the parts themselves are learnt during matching. Ideas from semi-supervised learning are used to bias the algorithm towards finding ‘perceptually valid’ part structures. Shapes are represented by unlabeled point sets of arbitrary size and a background component is used to handle occlusion, local dissimilarity and clutter. Thus, unlike many shape matching techniques, our approach can be applied to shapes extracted from real images. Model parameters are estimated using an EM algorithm that alternates between finding a soft correspondence and computing the optimal part transformations using Procrustes analysis.
6 0.53435922 20 nips-2006-Active learning for misspecified generalized linear models
7 0.53294528 61 nips-2006-Convex Repeated Games and Fenchel Duality
8 0.53261858 83 nips-2006-Generalized Maximum Margin Clustering and Unsupervised Kernel Learning
9 0.52834463 65 nips-2006-Denoising and Dimension Reduction in Feature Space
10 0.52468234 195 nips-2006-Training Conditional Random Fields for Maximum Labelwise Accuracy
11 0.52354658 87 nips-2006-Graph Laplacian Regularization for Large-Scale Semidefinite Programming
12 0.52286828 106 nips-2006-Large Margin Hidden Markov Models for Automatic Speech Recognition
13 0.52035248 184 nips-2006-Stratification Learning: Detecting Mixed Density and Dimensionality in High Dimensional Point Clouds
14 0.519961 3 nips-2006-A Complexity-Distortion Approach to Joint Pattern Alignment
15 0.51925945 117 nips-2006-Learning on Graph with Laplacian Regularization
16 0.51858377 152 nips-2006-Online Classification for Complex Problems Using Simultaneous Projections
17 0.51823825 95 nips-2006-Implicit Surfaces with Globally Regularised and Compactly Supported Basis Functions
18 0.51800507 203 nips-2006-implicit Online Learning with Kernels
19 0.51767081 5 nips-2006-A Kernel Method for the Two-Sample-Problem
20 0.51663929 72 nips-2006-Efficient Learning of Sparse Representations with an Energy-Based Model