nips nips2006 nips2006-48 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Olivier Chapelle, Vikas Sindhwani, S. S. Keerthi
Abstract: Semi-supervised SVMs (S3 VM) attempt to learn low-density separators by maximizing the margin over labeled and unlabeled examples. The associated optimization problem is non-convex. To examine the full potential of S3 VMs modulo local minima problems in current implementations, we apply branch and bound techniques for obtaining exact, globally optimal solutions. Empirical evidence suggests that the globally optimal solution can return excellent generalization performance in situations where other implementations fail completely. While our current implementation is only applicable to small datasets, we discuss variants that can potentially lead to practically useful algorithms. 1
Reference: text
sentIndex sentText sentNum sentScore
1 com Abstract Semi-supervised SVMs (S3 VM) attempt to learn low-density separators by maximizing the margin over labeled and unlabeled examples. [sent-8, score-0.486]
2 To examine the full potential of S3 VMs modulo local minima problems in current implementations, we apply branch and bound techniques for obtaining exact, globally optimal solutions. [sent-10, score-0.572]
3 Empirical evidence suggests that the globally optimal solution can return excellent generalization performance in situations where other implementations fail completely. [sent-11, score-0.326]
4 1 Introduction A major line of research on extending SVMs to handle partially labeled datasets is based on the following idea: solve the standard SVM problem while treating the unknown labels as additional optimization variables. [sent-13, score-0.243]
5 By maximizing the margin in the presence of unlabeled data, one learns a decision boundary that traverses through low data-density regions while respecting labels in the input space. [sent-14, score-0.365]
6 Several experimental studies have established that S3 VM implementations show varying degrees of empirical success. [sent-21, score-0.149]
7 The following questions motivate this paper: How well do current S3 VM implementations approximate the exact, globally optimal solution of the non-convex problem associated with S3 VMs ? [sent-23, score-0.291]
8 This is partly due to the lack of simple implementations that practitioners can use to benchmark new algorithms against the global solution, even on small-sized problems. [sent-26, score-0.234]
9 com Our contribution in this paper is to outline a class of Branch and Bound algorithms that are guaranteed to provide the globally optimal solution for S3 VMs. [sent-29, score-0.108]
10 Branch and bound techniques have previously been noted in the context of S3 VM in [16], but no details were presented there. [sent-30, score-0.157]
11 We implement and evaluate a branch and bound strategy that can serve as an upper baseline for S3 VM algorithms. [sent-31, score-0.607]
12 This strategy is not practical for typical semisupervised settings where large amounts of unlabeled data is available. [sent-32, score-0.427]
13 Empirical results on some semi-supervised tasks presented in section 7 show that the exact solution found by branch and bound has excellent generalization performance, while other S3 VM implementations perform poorly. [sent-34, score-0.71]
14 The training set consists of l labeled examples {(xi , yi )}l , yi = ±1, and of u the unlabeled examples {xi }n i=1 i=l+1 , with n = l + u. [sent-39, score-0.718]
15 In the linear case, the following objective function is minimized on both the hyperplane parameters w and b, and on the label vector yu := [yl+1 . [sent-40, score-0.517]
16 yn ] , min w,b,yu ,ξi ≥0 1 2 w +C 2 l n p ξi + C ∗ i=1 p ξi (1) i=l+1 under constraints yi (w · xi + b) ≥ 1 − ξi , 1 ≤ i ≤ n. [sent-43, score-0.155]
17 The last one takes into account the unlabeled points and can be seen as an implementation of the cluster assumption [11] or low density separation assumption [6]; indeed, it drives the outputs of the unlabeled points away from 0 (see figure 1). [sent-48, score-0.771]
18 5 Figure 1: With p = 2 in (1), the loss of a point with label y and signed output t is max(0, 1 − yt)2 . [sent-57, score-0.149]
19 For an unlabeled point, this is miny max(0, 1 − yt)2 = max(0, 1 − |t|)2 . [sent-58, score-0.333]
20 (2) i=l+1 This constraint is necessary to avoid unbalanced solutions and has also been used in the original implementation [9]. [sent-62, score-0.123]
21 Ideally, the parameter r should be set to the ratio of positive points in the unlabeled set. [sent-63, score-0.36]
22 Since it is unknown, r is usually estimated through the class ratio on the labeled set. [sent-64, score-0.153]
23 For the sake of simplicity, in the rest of the paper, we set r to the true ratio of positive points in the unlabeled set. [sent-66, score-0.387]
24 Let us call I the objective function to be minimized: I(w, b, yu ) = 1 2 w +C 2 n max(0, 1 − yi (w · xi + b))2 . [sent-67, score-0.583]
25 i=1 There are two main strategies to minimize I: (1) For a given fixed w and b, the optimal yu is simply given by the signs of w · xi + b. [sent-68, score-0.371]
26 (2) For a given yu , the optimization on w and b is a standard SVM training. [sent-71, score-0.309]
27 The constraint (2) is implemented by setting J (yu ) = +∞ for all vectors yu not satisfying it. [sent-75, score-0.422]
28 1 Branch and bound basics Suppose we want to minimize a function f over a space X , where X is usually discrete. [sent-77, score-0.18]
29 A branch and bound algorithm has two main ingredients: Branching : the region X is recursively split into smaller subregions. [sent-78, score-0.509]
30 This yields a tree structure where each node corresponds to a subregion. [sent-79, score-0.162]
31 Suppose that an upper bound (say a) on the best value of f over A is known and a lower bound (say b) on the best value of f over B is known and that a < b. [sent-83, score-0.44]
32 2 Branch and bound for S3 VM The aim is to minimize (3) over all 2u possible choices for the vector yu ,1 which constitute the set X introduced above. [sent-87, score-0.489]
33 Any node corresponds to a partial labeling of the data set and its two children correspond to the labeling of some unlabeled point. [sent-89, score-0.541]
34 One can thus associate with any node a labeled set L containing both the original labeled examples and a subset S of unlabeled examples {(xj , yj )}j∈S⊆[l+1. [sent-90, score-0.774]
35 One can also associate an unlabeled set U = [l + 1 . [sent-94, score-0.362]
36 n] \ S corresponding to the subset of unlabeled points which have not been assigned a label yet. [sent-97, score-0.449]
37 The size of the subtree rooted at this node is thus 2|U | . [sent-98, score-0.172]
38 The root of the tree has only the original set of labeled examples associated with it, i. [sent-99, score-0.243]
39 The leaves in the tree correspond to a complete labeling of the dataset, i. [sent-101, score-0.154]
40 As for any branch and bound algorithm, we have to decide about the following choices, Branching: For a given node in the tree (i. [sent-105, score-0.635]
41 a partial labeling of the unlabeled set), what should be its two children (i. [sent-107, score-0.43]
42 Bounding: Which upper and lower bounds should be used? [sent-110, score-0.126]
43 Exploration: In which order will the search tree be examined? [sent-111, score-0.12]
44 Note that the tree is not built explicitly but on the fly as we explore it. [sent-113, score-0.138]
45 1 „ There are actually only u ur « effective choices because of the constraint (2). [sent-114, score-0.134]
46 Concerning the upper bound, we decided to have the following simple strategy: for a leaf node, the upper bound is simply the value of the function; for a non leaf node, there is no upper bound. [sent-115, score-0.552]
47 In other words, the upper bound is the best objective function found so far. [sent-116, score-0.355]
48 1, the set A is the leaf corresponding to the best solution found so far and the set B is the subtree that we are considering to explore. [sent-118, score-0.216]
49 Because of this choice for the upper bound, a natural way to explore the tree is a depth first search. [sent-119, score-0.217]
50 Indeed it is important to go to the leaves as often as possible in order to have a tight upper bound and thus perform aggressive pruning. [sent-120, score-0.261]
51 The choice of the lower bound and the branching strategy are presented next. [sent-121, score-0.355]
52 4 Lower bound We consider a simple lower bound based on the following observation. [sent-122, score-0.361]
53 The minimum of the objective function (1) is smaller when C ∗ = 0 than when C ∗ > 0. [sent-123, score-0.119]
54 But C ∗ = 0 corresponds to a standard SVM, ignoring the unlabeled data. [sent-124, score-0.333]
55 We can therefore compute a lower bound at a given node by optimizing a standard SVM on the labeled set associated with this node. [sent-125, score-0.429]
56 It is based on the dual objective function of SVMs. [sent-127, score-0.156]
57 Let D(α, yU ) be the dual objective function, where yU corresponds to the labels of the unlabeled points which have not been assigned a label yet, n n αi − D(α, yU ) = i=1 1 δij αi αj yi yj K(xi , xj ) + 2 i,j=1 2C . [sent-128, score-0.787]
58 (4) The dual feasibility is αi ≥ 0 and αi yi = 0. [sent-129, score-0.153]
59 Let Q(yU ) := D(α(yU ), yU ) and lb a lower bound on (or the value of) min Q(yU ), where the minimum is taken over all yU satisfying the balancing constraint (2). [sent-132, score-0.394]
60 Then lb is also a lower bound for the value of the objective function corresponding to that node. [sent-133, score-0.369]
61 The goal is thus to find a choice for α(yU ) such that a lower bound on Q can be computed efficiently. [sent-134, score-0.204]
62 The choice corresponding to the lower bound presented above is the following. [sent-135, score-0.204]
63 Train an SVM on the labeled points, obtain the vector α and complete it with zeros for the unlabeled points. [sent-136, score-0.512]
64 Then Q(yU ) is the same for all the possible labelings of the unlabeled points and the lower bound is the SVM objective function on the labeled points. [sent-137, score-0.836]
65 2 To lower bound Q, one can use results from the quadratic zero-one programming literature [10] or solve a constrained eigenvalue problem [8]. [sent-140, score-0.278]
66 Finally, note that unless U yi = 0, the constraint αi yi = 0 will not be satisfied. [sent-141, score-0.304]
67 One remedy is to train the supervised SVM with the constraint αi yi = −γ U yi = γ(n − 2ru + L yi ) (because of (2)). [sent-142, score-0.42]
68 5 Branching At a given node, some unlabeled points have already been assigned a label. [sent-144, score-0.36]
69 Since our strategy is to reach a good solution as soon as possible (see last paragraph of section 3. [sent-146, score-0.107]
70 A simple possibility would be to branch on the unlabeled point which is the nearest from another labeled point using a reliable distance metric. [sent-148, score-0.852]
71 But we now present a more principled approach based on the analysis of the objective value. [sent-149, score-0.119]
72 We say that we are ”confident” about a particular label of an unlabeled point when assigning the opposite label results in a big increase of the objective value: this partial solution would then be unlikely to lead to the optimal one. [sent-150, score-0.763]
73 2 that a node is associated with a set L of currently labeled examples and a set U of unlabeled examples. [sent-153, score-0.558]
74 Let s(L) be the SVM objective function trained on the labeled set, 1 2 s(L) = min w +C max(0, 1 − yi (w · xi + b))2 . [sent-154, score-0.427]
75 (6) w,b 2 (xi ,yi )∈L As discussed in the previous section, the lower bound is s(L). [sent-155, score-0.204]
76 Now our branching strategy consists in selecting the following point in U , arg max s(L ∪ {x, y}) (7) x∈U, y∈±1 In other words, we want to find the unlabeled point x∗ and its label y ∗ which would make the objective function increase as much as possible. [sent-156, score-0.785]
77 Then we branch on x∗ , but start exploring the branch with the most likely label −y ∗ . [sent-157, score-0.748]
78 This strategy has an intuitive link with the ”label propagation” idea [17]: an unlabeled point which is near from a labeled point is likely to be of the same label; otherwise, the objective function would be large. [sent-158, score-0.737]
79 Proposition 1 Consider training an SVM on a labeled set L with quadratic penalization of the errors (cf (6) or (4)). [sent-166, score-0.206]
80 It is based on the fact that s(L) = 2 ysv Ksv ysv and relies on the block matrix inverse formula. [sent-170, score-0.104]
81 At the beginning, the upper bound can either be set to +∞ or to a solution found by another algorithm. [sent-172, score-0.288]
82 Note that the SVM trainings are incremental: whenever we go down the tree, one point is added in the labeled set. [sent-173, score-0.23]
83 7 Experiments We consider here two datasets where other S3 VM implementations are unable to achieve satisfying test error rates. [sent-175, score-0.222]
84 This naturally raises the following questions: Is this weak per- Algorithm 1 Branch and bound for S3 VM(BB). [sent-176, score-0.157]
85 Function: (Y ∗ , v) ← S3 VM(Y, ub) % Recursive implementation Input: Y : a partly labeled vector (0 for unlabeled) ub: an upper bound on the optimal objective value. [sent-177, score-0.601]
86 Output: Y ∗ : optimal fully labeled vector v: corresponding objective function. [sent-178, score-0.272]
87 if max(0, Yi ) > ur OR max(0, −Yi ) < n − ur then return % Constraint (2) can not be satisfied end if −→ Do not explore this subtree v ← SVM(Y ) % Compute the SVM objective function on the labeled points. [sent-179, score-0.577]
88 1 Two moons The “two moons” dataset is now a standard benchmark for semi-supervised learning algorithms. [sent-182, score-0.193]
89 However, these methods have a constraint that the mean output on the unlabeled point should be equal to some constant. [sent-189, score-0.43]
90 Note that the test errors for other S3 VM implementations are likely to be improved by hyperparameter tuning, but they will still stay very high. [sent-192, score-0.204]
91 Table 1: Results on the two moons dataset (averaged over 100 random realizations) S3 VM cS3 VM CCCP SVMlight DA BB LapSVM Test error (%) 59. [sent-199, score-0.15]
92 We randomly selected 2 images per class to be in the labeled set and the rest being unlabeled. [sent-212, score-0.153]
93 8 Discussion and Conclusion We implemented and evaluated one strategy amongst many in the class of branch and bound methods to find the globally optimal solution of S3 VMs. [sent-228, score-0.636]
94 This basic implementation can perhaps be made more efficient by choosing better bounding and branching schemes. [sent-231, score-0.196]
95 Also, by fixing the upper bound as the currently best objective 2 The reported test errors are somehow irrelevant and should not be used for ranking the different algorithms. [sent-232, score-0.383]
96 It is conceivable that breadth-first search is equally or more effective in conjunction with alternative upper bounding schemes. [sent-235, score-0.158]
97 Finally, we note that a large family of well-tested branch and bound procedures from zeroone quadratic programming literature can be immediately applied to the S3 VM problem for the special case of squared loss. [sent-238, score-0.521]
98 For these reasons, we believe that this implementation does not scale to large datasets, but should instead be considered as a proof of concept: the S3 VM objective function is very well suited for semi-supervised learning and more effort should be made on trying to efficiently find good local minima. [sent-242, score-0.194]
99 Computational aspects of a branch and bound algorithm for quadratic zero-one programming. [sent-306, score-0.498]
100 Learning from labeled and unlabeled data with label propagation. [sent-347, score-0.575]
wordName wordTfidf (topN-words)
[('vm', 0.515), ('unlabeled', 0.333), ('branch', 0.316), ('yu', 0.309), ('bound', 0.157), ('labeled', 0.153), ('implementations', 0.149), ('objective', 0.119), ('svm', 0.118), ('yi', 0.116), ('moons', 0.105), ('vms', 0.105), ('ub', 0.104), ('subtree', 0.1), ('branching', 0.096), ('tree', 0.09), ('label', 0.089), ('upper', 0.079), ('cccp', 0.079), ('ksv', 0.079), ('chapelle', 0.077), ('node', 0.072), ('constraint', 0.072), ('sv', 0.066), ('transductive', 0.066), ('leaf', 0.064), ('coil', 0.062), ('lapsvm', 0.062), ('svmlight', 0.062), ('ur', 0.062), ('realizations', 0.058), ('globally', 0.056), ('strategy', 0.055), ('sindhwani', 0.055), ('trainings', 0.052), ('ysv', 0.052), ('solution', 0.052), ('implementation', 0.051), ('bb', 0.05), ('bounding', 0.049), ('explore', 0.048), ('lower', 0.047), ('lb', 0.046), ('dataset', 0.045), ('concerning', 0.045), ('find', 0.043), ('max', 0.043), ('benchmark', 0.043), ('minima', 0.043), ('partly', 0.042), ('da', 0.041), ('satisfying', 0.041), ('labeling', 0.039), ('semisupervised', 0.039), ('xi', 0.039), ('dual', 0.037), ('excellent', 0.036), ('recursively', 0.036), ('keerthi', 0.035), ('yahoo', 0.035), ('signed', 0.035), ('continuation', 0.035), ('yj', 0.034), ('questions', 0.034), ('vapnik', 0.033), ('zien', 0.033), ('return', 0.033), ('datasets', 0.032), ('labels', 0.032), ('ective', 0.032), ('outperform', 0.032), ('partial', 0.032), ('support', 0.031), ('balancing', 0.031), ('annealing', 0.031), ('search', 0.03), ('non', 0.03), ('dent', 0.03), ('associate', 0.029), ('errors', 0.028), ('likely', 0.027), ('sake', 0.027), ('points', 0.027), ('zeros', 0.026), ('children', 0.026), ('incremental', 0.026), ('solve', 0.026), ('point', 0.025), ('chicago', 0.025), ('leaves', 0.025), ('quadratic', 0.025), ('believe', 0.024), ('say', 0.024), ('minimize', 0.023), ('programming', 0.023), ('yt', 0.023), ('hij', 0.023), ('sinz', 0.023), ('sx', 0.023), ('subregions', 0.023)]
simIndex simValue paperId paperTitle
same-paper 1 1.0000008 48 nips-2006-Branch and Bound for Semi-Supervised Support Vector Machines
Author: Olivier Chapelle, Vikas Sindhwani, S. S. Keerthi
Abstract: Semi-supervised SVMs (S3 VM) attempt to learn low-density separators by maximizing the margin over labeled and unlabeled examples. The associated optimization problem is non-convex. To examine the full potential of S3 VMs modulo local minima problems in current implementations, we apply branch and bound techniques for obtaining exact, globally optimal solutions. Empirical evidence suggests that the globally optimal solution can return excellent generalization performance in situations where other implementations fail completely. While our current implementation is only applicable to small datasets, we discuss variants that can potentially lead to practically useful algorithms. 1
2 0.21566054 150 nips-2006-On Transductive Regression
Author: Corinna Cortes, Mehryar Mohri
Abstract: In many modern large-scale learning applications, the amount of unlabeled data far exceeds that of labeled data. A common instance of this problem is the transductive setting where the unlabeled test points are known to the learning algorithm. This paper presents a study of regression problems in that setting. It presents explicit VC-dimension error bounds for transductive regression that hold for all bounded loss functions and coincide with the tight classification bounds of Vapnik when applied to classification. It also presents a new transductive regression algorithm inspired by our bound that admits a primal and kernelized closedform solution and deals efficiently with large amounts of unlabeled data. The algorithm exploits the position of unlabeled points to locally estimate their labels and then uses a global optimization to ensure robust predictions. Our study also includes the results of experiments with several publicly available regression data sets with up to 20,000 unlabeled examples. The comparison with other transductive regression algorithms shows that it performs well and that it can scale to large data sets.
3 0.20450212 104 nips-2006-Large-Scale Sparsified Manifold Regularization
Author: Ivor W. Tsang, James T. Kwok
Abstract: Semi-supervised learning is more powerful than supervised learning by using both labeled and unlabeled data. In particular, the manifold regularization framework, together with kernel methods, leads to the Laplacian SVM (LapSVM) that has demonstrated state-of-the-art performance. However, the LapSVM solution typically involves kernel expansions of all the labeled and unlabeled examples, and is slow on testing. Moreover, existing semi-supervised learning methods, including the LapSVM, can only handle a small number of unlabeled examples. In this paper, we integrate manifold regularization with the core vector machine, which has been used for large-scale supervised and unsupervised learning. By using a sparsified manifold regularizer and formulating as a center-constrained minimum enclosing ball problem, the proposed method produces sparse solutions with low time and space complexities. Experimental results show that it is much faster than the LapSVM, and can handle a million unlabeled examples on a standard PC; while the LapSVM can only handle several thousand patterns. 1
4 0.16314575 136 nips-2006-Multi-Instance Multi-Label Learning with Application to Scene Classification
Author: Zhi-hua Zhou, Min-ling Zhang
Abstract: In this paper, we formalize multi-instance multi-label learning, where each training example is associated with not only multiple instances but also multiple class labels. Such a problem can occur in many real-world tasks, e.g. an image usually contains multiple patches each of which can be described by a feature vector, and the image can belong to multiple categories since its semantics can be recognized in different ways. We analyze the relationship between multi-instance multi-label learning and the learning frameworks of traditional supervised learning, multiinstance learning and multi-label learning. Then, we propose the M IML B OOST and M IML S VM algorithms which achieve good performance in an application to scene classification. 1
5 0.13810469 118 nips-2006-Learning to Model Spatial Dependency: Semi-Supervised Discriminative Random Fields
Author: Chi-hoon Lee, Shaojun Wang, Feng Jiao, Dale Schuurmans, Russell Greiner
Abstract: We present a novel, semi-supervised approach to training discriminative random fields (DRFs) that efficiently exploits labeled and unlabeled training data to achieve improved accuracy in a variety of image processing tasks. We formulate DRF training as a form of MAP estimation that combines conditional loglikelihood on labeled data, given a data-dependent prior, with a conditional entropy regularizer defined on unlabeled data. Although the training objective is no longer concave, we develop an efficient local optimization procedure that produces classifiers that are more accurate than ones based on standard supervised DRF training. We then apply our semi-supervised approach to train DRFs to segment both synthetic and real data sets, and demonstrate significant improvements over supervised DRFs in each case.
6 0.12830035 163 nips-2006-Prediction on a Graph with a Perceptron
7 0.11614759 186 nips-2006-Support Vector Machines on a Budget
8 0.096859001 183 nips-2006-Stochastic Relational Models for Discriminative Link Prediction
9 0.090858258 28 nips-2006-An Efficient Method for Gradient-Based Adaptation of Hyperparameters in SVM Models
10 0.089172639 172 nips-2006-Scalable Discriminative Learning for Natural Language Parsing and Translation
11 0.082559109 171 nips-2006-Sample Complexity of Policy Search with Known Dynamics
12 0.081915714 33 nips-2006-Analysis of Representations for Domain Adaptation
13 0.081264883 93 nips-2006-Hyperparameter Learning for Graph Based Semi-supervised Learning Algorithms
14 0.080444664 51 nips-2006-Clustering Under Prior Knowledge with Application to Image Segmentation
15 0.076553062 195 nips-2006-Training Conditional Random Fields for Maximum Labelwise Accuracy
16 0.073037095 68 nips-2006-Dirichlet-Enhanced Spam Filtering based on Biased Samples
17 0.070205458 152 nips-2006-Online Classification for Complex Problems Using Simultaneous Projections
18 0.06965135 193 nips-2006-Tighter PAC-Bayes Bounds
19 0.067834377 78 nips-2006-Fast Discriminative Visual Codebooks using Randomized Clustering Forests
20 0.067704625 74 nips-2006-Efficient Structure Learning of Markov Networks using $L 1$-Regularization
topicId topicWeight
[(0, -0.225), (1, 0.089), (2, -0.053), (3, 0.015), (4, -0.003), (5, 0.184), (6, -0.13), (7, 0.049), (8, 0.019), (9, -0.105), (10, -0.098), (11, -0.23), (12, -0.095), (13, 0.008), (14, 0.017), (15, 0.091), (16, 0.159), (17, -0.059), (18, -0.084), (19, -0.027), (20, -0.124), (21, -0.202), (22, 0.043), (23, 0.214), (24, 0.023), (25, 0.167), (26, -0.057), (27, 0.018), (28, -0.137), (29, 0.027), (30, -0.097), (31, 0.04), (32, 0.077), (33, 0.107), (34, -0.04), (35, 0.075), (36, -0.069), (37, 0.027), (38, -0.023), (39, 0.106), (40, -0.04), (41, 0.017), (42, 0.08), (43, -0.135), (44, -0.09), (45, -0.096), (46, 0.037), (47, -0.009), (48, -0.021), (49, -0.075)]
simIndex simValue paperId paperTitle
same-paper 1 0.95573217 48 nips-2006-Branch and Bound for Semi-Supervised Support Vector Machines
Author: Olivier Chapelle, Vikas Sindhwani, S. S. Keerthi
Abstract: Semi-supervised SVMs (S3 VM) attempt to learn low-density separators by maximizing the margin over labeled and unlabeled examples. The associated optimization problem is non-convex. To examine the full potential of S3 VMs modulo local minima problems in current implementations, we apply branch and bound techniques for obtaining exact, globally optimal solutions. Empirical evidence suggests that the globally optimal solution can return excellent generalization performance in situations where other implementations fail completely. While our current implementation is only applicable to small datasets, we discuss variants that can potentially lead to practically useful algorithms. 1
2 0.77403837 150 nips-2006-On Transductive Regression
Author: Corinna Cortes, Mehryar Mohri
Abstract: In many modern large-scale learning applications, the amount of unlabeled data far exceeds that of labeled data. A common instance of this problem is the transductive setting where the unlabeled test points are known to the learning algorithm. This paper presents a study of regression problems in that setting. It presents explicit VC-dimension error bounds for transductive regression that hold for all bounded loss functions and coincide with the tight classification bounds of Vapnik when applied to classification. It also presents a new transductive regression algorithm inspired by our bound that admits a primal and kernelized closedform solution and deals efficiently with large amounts of unlabeled data. The algorithm exploits the position of unlabeled points to locally estimate their labels and then uses a global optimization to ensure robust predictions. Our study also includes the results of experiments with several publicly available regression data sets with up to 20,000 unlabeled examples. The comparison with other transductive regression algorithms shows that it performs well and that it can scale to large data sets.
3 0.7710914 104 nips-2006-Large-Scale Sparsified Manifold Regularization
Author: Ivor W. Tsang, James T. Kwok
Abstract: Semi-supervised learning is more powerful than supervised learning by using both labeled and unlabeled data. In particular, the manifold regularization framework, together with kernel methods, leads to the Laplacian SVM (LapSVM) that has demonstrated state-of-the-art performance. However, the LapSVM solution typically involves kernel expansions of all the labeled and unlabeled examples, and is slow on testing. Moreover, existing semi-supervised learning methods, including the LapSVM, can only handle a small number of unlabeled examples. In this paper, we integrate manifold regularization with the core vector machine, which has been used for large-scale supervised and unsupervised learning. By using a sparsified manifold regularizer and formulating as a center-constrained minimum enclosing ball problem, the proposed method produces sparse solutions with low time and space complexities. Experimental results show that it is much faster than the LapSVM, and can handle a million unlabeled examples on a standard PC; while the LapSVM can only handle several thousand patterns. 1
4 0.66011697 136 nips-2006-Multi-Instance Multi-Label Learning with Application to Scene Classification
Author: Zhi-hua Zhou, Min-ling Zhang
Abstract: In this paper, we formalize multi-instance multi-label learning, where each training example is associated with not only multiple instances but also multiple class labels. Such a problem can occur in many real-world tasks, e.g. an image usually contains multiple patches each of which can be described by a feature vector, and the image can belong to multiple categories since its semantics can be recognized in different ways. We analyze the relationship between multi-instance multi-label learning and the learning frameworks of traditional supervised learning, multiinstance learning and multi-label learning. Then, we propose the M IML B OOST and M IML S VM algorithms which achieve good performance in an application to scene classification. 1
5 0.63004738 118 nips-2006-Learning to Model Spatial Dependency: Semi-Supervised Discriminative Random Fields
Author: Chi-hoon Lee, Shaojun Wang, Feng Jiao, Dale Schuurmans, Russell Greiner
Abstract: We present a novel, semi-supervised approach to training discriminative random fields (DRFs) that efficiently exploits labeled and unlabeled training data to achieve improved accuracy in a variety of image processing tasks. We formulate DRF training as a form of MAP estimation that combines conditional loglikelihood on labeled data, given a data-dependent prior, with a conditional entropy regularizer defined on unlabeled data. Although the training objective is no longer concave, we develop an efficient local optimization procedure that produces classifiers that are more accurate than ones based on standard supervised DRF training. We then apply our semi-supervised approach to train DRFs to segment both synthetic and real data sets, and demonstrate significant improvements over supervised DRFs in each case.
6 0.57591945 93 nips-2006-Hyperparameter Learning for Graph Based Semi-supervised Learning Algorithms
7 0.41801706 68 nips-2006-Dirichlet-Enhanced Spam Filtering based on Biased Samples
8 0.38685709 28 nips-2006-An Efficient Method for Gradient-Based Adaptation of Hyperparameters in SVM Models
9 0.3783668 186 nips-2006-Support Vector Machines on a Budget
10 0.36377344 129 nips-2006-Map-Reduce for Machine Learning on Multicore
11 0.36077243 33 nips-2006-Analysis of Representations for Domain Adaptation
12 0.34873992 101 nips-2006-Isotonic Conditional Random Fields and Local Sentiment Flow
13 0.34812424 169 nips-2006-Relational Learning with Gaussian Processes
14 0.339908 172 nips-2006-Scalable Discriminative Learning for Natural Language Parsing and Translation
15 0.3304837 163 nips-2006-Prediction on a Graph with a Perceptron
16 0.32190388 193 nips-2006-Tighter PAC-Bayes Bounds
17 0.30166793 156 nips-2006-Ordinal Regression by Extended Binary Classification
18 0.2978467 183 nips-2006-Stochastic Relational Models for Discriminative Link Prediction
19 0.2942985 140 nips-2006-Multiple Instance Learning for Computer Aided Diagnosis
20 0.28787142 24 nips-2006-Aggregating Classification Accuracy across Time: Application to Single Trial EEG
topicId topicWeight
[(1, 0.124), (3, 0.064), (7, 0.101), (9, 0.041), (22, 0.07), (44, 0.097), (57, 0.069), (64, 0.019), (65, 0.059), (69, 0.034), (72, 0.224), (90, 0.013)]
simIndex simValue paperId paperTitle
1 0.91312516 70 nips-2006-Doubly Stochastic Normalization for Spectral Clustering
Author: Ron Zass, Amnon Shashua
Abstract: In this paper we focus on the issue of normalization of the affinity matrix in spectral clustering. We show that the difference between N-cuts and Ratio-cuts is in the error measure being used (relative-entropy versus L1 norm) in finding the closest doubly-stochastic matrix to the input affinity matrix. We then develop a scheme for finding the optimal, under Frobenius norm, doubly-stochastic approximation using Von-Neumann’s successive projections lemma. The new normalization scheme is simple and efficient and provides superior clustering performance over many of the standardized tests. 1
same-paper 2 0.84718144 48 nips-2006-Branch and Bound for Semi-Supervised Support Vector Machines
Author: Olivier Chapelle, Vikas Sindhwani, S. S. Keerthi
Abstract: Semi-supervised SVMs (S3 VM) attempt to learn low-density separators by maximizing the margin over labeled and unlabeled examples. The associated optimization problem is non-convex. To examine the full potential of S3 VMs modulo local minima problems in current implementations, we apply branch and bound techniques for obtaining exact, globally optimal solutions. Empirical evidence suggests that the globally optimal solution can return excellent generalization performance in situations where other implementations fail completely. While our current implementation is only applicable to small datasets, we discuss variants that can potentially lead to practically useful algorithms. 1
3 0.70398265 87 nips-2006-Graph Laplacian Regularization for Large-Scale Semidefinite Programming
Author: Kilian Q. Weinberger, Fei Sha, Qihui Zhu, Lawrence K. Saul
Abstract: In many areas of science and engineering, the problem arises how to discover low dimensional representations of high dimensional data. Recently, a number of researchers have converged on common solutions to this problem using methods from convex optimization. In particular, many results have been obtained by constructing semidefinite programs (SDPs) with low rank solutions. While the rank of matrix variables in SDPs cannot be directly constrained, it has been observed that low rank solutions emerge naturally by computing high variance or maximal trace solutions that respect local distance constraints. In this paper, we show how to solve very large problems of this type by a matrix factorization that leads to much smaller SDPs than those previously studied. The matrix factorization is derived by expanding the solution of the original problem in terms of the bottom eigenvectors of a graph Laplacian. The smaller SDPs obtained from this matrix factorization yield very good approximations to solutions of the original problem. Moreover, these approximations can be further refined by conjugate gradient descent. We illustrate the approach on localization in large scale sensor networks, where optimizations involving tens of thousands of nodes can be solved in just a few minutes. 1
4 0.70148432 80 nips-2006-Fundamental Limitations of Spectral Clustering
Author: Boaz Nadler, Meirav Galun
Abstract: Spectral clustering methods are common graph-based approaches to clustering of data. Spectral clustering algorithms typically start from local information encoded in a weighted graph on the data and cluster according to the global eigenvectors of the corresponding (normalized) similarity matrix. One contribution of this paper is to present fundamental limitations of this general local to global approach. We show that based only on local information, the normalized cut functional is not a suitable measure for the quality of clustering. Further, even with a suitable similarity measure, we show that the first few eigenvectors of such adjacency matrices cannot successfully cluster datasets that contain structures at different scales of size and density. Based on these findings, a second contribution of this paper is a novel diffusion based measure to evaluate the coherence of individual clusters. Our measure can be used in conjunction with any bottom-up graph-based clustering method, it is scale-free and can determine coherent clusters at all scales. We present both synthetic examples and real image segmentation problems where various spectral clustering algorithms fail. In contrast, using this coherence measure finds the expected clusters at all scales. Keywords: Clustering, kernels, learning theory. 1
5 0.69953316 20 nips-2006-Active learning for misspecified generalized linear models
Author: Francis R. Bach
Abstract: Active learning refers to algorithmic frameworks aimed at selecting training data points in order to reduce the number of required training data points and/or improve the generalization performance of a learning method. In this paper, we present an asymptotic analysis of active learning for generalized linear models. Our analysis holds under the common practical situation of model misspecification, and is based on realistic assumptions regarding the nature of the sampling distributions, which are usually neither independent nor identical. We derive unbiased estimators of generalization performance, as well as estimators of expected reduction in generalization error after adding a new training data point, that allow us to optimize its sampling distribution through a convex optimization problem. Our analysis naturally leads to an algorithm for sequential active learning which is applicable for all tasks supported by generalized linear models (e.g., binary classification, multi-class classification, regression) and can be applied in non-linear settings through the use of Mercer kernels. 1
6 0.69862485 184 nips-2006-Stratification Learning: Detecting Mixed Density and Dimensionality in High Dimensional Point Clouds
7 0.693703 65 nips-2006-Denoising and Dimension Reduction in Feature Space
8 0.69236356 117 nips-2006-Learning on Graph with Laplacian Regularization
9 0.69223946 3 nips-2006-A Complexity-Distortion Approach to Joint Pattern Alignment
10 0.68938059 175 nips-2006-Simplifying Mixture Models through Function Approximation
11 0.68937552 106 nips-2006-Large Margin Hidden Markov Models for Automatic Speech Recognition
12 0.68631297 32 nips-2006-Analysis of Empirical Bayesian Methods for Neuroelectromagnetic Source Localization
13 0.68459082 35 nips-2006-Approximate inference using planar graph decomposition
14 0.68434763 128 nips-2006-Manifold Denoising
15 0.68317652 83 nips-2006-Generalized Maximum Margin Clustering and Unsupervised Kernel Learning
16 0.68211347 121 nips-2006-Learning to be Bayesian without Supervision
17 0.6820755 118 nips-2006-Learning to Model Spatial Dependency: Semi-Supervised Discriminative Random Fields
18 0.68195093 150 nips-2006-On Transductive Regression
19 0.68188751 119 nips-2006-Learning to Rank with Nonsmooth Cost Functions
20 0.68174011 127 nips-2006-MLLE: Modified Locally Linear Embedding Using Multiple Weights