nips nips2006 nips2006-48 knowledge-graph by maker-knowledge-mining

48 nips-2006-Branch and Bound for Semi-Supervised Support Vector Machines

Source: pdf

Author: Olivier Chapelle, Vikas Sindhwani, S. S. Keerthi

Abstract: Semi-supervised SVMs (S3 VM) attempt to learn low-density separators by maximizing the margin over labeled and unlabeled examples. The associated optimization problem is non-convex. To examine the full potential of S3 VMs modulo local minima problems in current implementations, we apply branch and bound techniques for obtaining exact, globally optimal solutions. Empirical evidence suggests that the globally optimal solution can return excellent generalization performance in situations where other implementations fail completely. While our current implementation is only applicable to small datasets, we discuss variants that can potentially lead to practically useful algorithms. 1

Reference: text

Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 com Abstract Semi-supervised SVMs (S3 VM) attempt to learn low-density separators by maximizing the margin over labeled and unlabeled examples. [sent-8, score-0.486]

2 To examine the full potential of S3 VMs modulo local minima problems in current implementations, we apply branch and bound techniques for obtaining exact, globally optimal solutions. [sent-10, score-0.572]

3 Empirical evidence suggests that the globally optimal solution can return excellent generalization performance in situations where other implementations fail completely. [sent-11, score-0.326]

4 1 Introduction A major line of research on extending SVMs to handle partially labeled datasets is based on the following idea: solve the standard SVM problem while treating the unknown labels as additional optimization variables. [sent-13, score-0.243]

5 By maximizing the margin in the presence of unlabeled data, one learns a decision boundary that traverses through low data-density regions while respecting labels in the input space. [sent-14, score-0.365]

6 Several experimental studies have established that S3 VM implementations show varying degrees of empirical success. [sent-21, score-0.149]

7 The following questions motivate this paper: How well do current S3 VM implementations approximate the exact, globally optimal solution of the non-convex problem associated with S3 VMs ? [sent-23, score-0.291]

8 This is partly due to the lack of simple implementations that practitioners can use to benchmark new algorithms against the global solution, even on small-sized problems. [sent-26, score-0.234]

9 com Our contribution in this paper is to outline a class of Branch and Bound algorithms that are guaranteed to provide the globally optimal solution for S3 VMs. [sent-29, score-0.108]

10 Branch and bound techniques have previously been noted in the context of S3 VM in [16], but no details were presented there. [sent-30, score-0.157]

11 We implement and evaluate a branch and bound strategy that can serve as an upper baseline for S3 VM algorithms. [sent-31, score-0.607]

12 This strategy is not practical for typical semisupervised settings where large amounts of unlabeled data is available. [sent-32, score-0.427]

13 Empirical results on some semi-supervised tasks presented in section 7 show that the exact solution found by branch and bound has excellent generalization performance, while other S3 VM implementations perform poorly. [sent-34, score-0.71]

14 The training set consists of l labeled examples {(xi , yi )}l , yi = ±1, and of u the unlabeled examples {xi }n i=1 i=l+1 , with n = l + u. [sent-39, score-0.718]

15 In the linear case, the following objective function is minimized on both the hyperplane parameters w and b, and on the label vector yu := [yl+1 . [sent-40, score-0.517]

16 yn ] , min w,b,yu ,ξi ≥0 1 2 w +C 2 l n p ξi + C ∗ i=1 p ξi (1) i=l+1 under constraints yi (w · xi + b) ≥ 1 − ξi , 1 ≤ i ≤ n. [sent-43, score-0.155]

17 The last one takes into account the unlabeled points and can be seen as an implementation of the cluster assumption [11] or low density separation assumption [6]; indeed, it drives the outputs of the unlabeled points away from 0 (see ﬁgure 1). [sent-48, score-0.771]

18 5 Figure 1: With p = 2 in (1), the loss of a point with label y and signed output t is max(0, 1 − yt)2 . [sent-57, score-0.149]

19 For an unlabeled point, this is miny max(0, 1 − yt)2 = max(0, 1 − |t|)2 . [sent-58, score-0.333]

20 (2) i=l+1 This constraint is necessary to avoid unbalanced solutions and has also been used in the original implementation [9]. [sent-62, score-0.123]

21 Ideally, the parameter r should be set to the ratio of positive points in the unlabeled set. [sent-63, score-0.36]

22 Since it is unknown, r is usually estimated through the class ratio on the labeled set. [sent-64, score-0.153]

23 For the sake of simplicity, in the rest of the paper, we set r to the true ratio of positive points in the unlabeled set. [sent-66, score-0.387]

24 Let us call I the objective function to be minimized: I(w, b, yu ) = 1 2 w +C 2 n max(0, 1 − yi (w · xi + b))2 . [sent-67, score-0.583]

25 i=1 There are two main strategies to minimize I: (1) For a given ﬁxed w and b, the optimal yu is simply given by the signs of w · xi + b. [sent-68, score-0.371]

26 (2) For a given yu , the optimization on w and b is a standard SVM training. [sent-71, score-0.309]

27 The constraint (2) is implemented by setting J (yu ) = +∞ for all vectors yu not satisfying it. [sent-75, score-0.422]

28 1 Branch and bound basics Suppose we want to minimize a function f over a space X , where X is usually discrete. [sent-77, score-0.18]

29 A branch and bound algorithm has two main ingredients: Branching : the region X is recursively split into smaller subregions. [sent-78, score-0.509]

30 This yields a tree structure where each node corresponds to a subregion. [sent-79, score-0.162]

31 Suppose that an upper bound (say a) on the best value of f over A is known and a lower bound (say b) on the best value of f over B is known and that a < b. [sent-83, score-0.44]

32 2 Branch and bound for S3 VM The aim is to minimize (3) over all 2u possible choices for the vector yu ,1 which constitute the set X introduced above. [sent-87, score-0.489]

33 Any node corresponds to a partial labeling of the data set and its two children correspond to the labeling of some unlabeled point. [sent-89, score-0.541]

34 One can thus associate with any node a labeled set L containing both the original labeled examples and a subset S of unlabeled examples {(xj , yj )}j∈S⊆[l+1. [sent-90, score-0.774]

35 One can also associate an unlabeled set U = [l + 1 . [sent-94, score-0.362]

36 n] \ S corresponding to the subset of unlabeled points which have not been assigned a label yet. [sent-97, score-0.449]

37 The size of the subtree rooted at this node is thus 2|U | . [sent-98, score-0.172]

38 The root of the tree has only the original set of labeled examples associated with it, i. [sent-99, score-0.243]

39 The leaves in the tree correspond to a complete labeling of the dataset, i. [sent-101, score-0.154]

40 As for any branch and bound algorithm, we have to decide about the following choices, Branching: For a given node in the tree (i. [sent-105, score-0.635]

41 a partial labeling of the unlabeled set), what should be its two children (i. [sent-107, score-0.43]

42 Bounding: Which upper and lower bounds should be used? [sent-110, score-0.126]

43 Exploration: In which order will the search tree be examined? [sent-111, score-0.12]

44 Note that the tree is not built explicitly but on the ﬂy as we explore it. [sent-113, score-0.138]

45 1 „ There are actually only u ur « eﬀective choices because of the constraint (2). [sent-114, score-0.134]

46 Concerning the upper bound, we decided to have the following simple strategy: for a leaf node, the upper bound is simply the value of the function; for a non leaf node, there is no upper bound. [sent-115, score-0.552]

47 In other words, the upper bound is the best objective function found so far. [sent-116, score-0.355]

48 1, the set A is the leaf corresponding to the best solution found so far and the set B is the subtree that we are considering to explore. [sent-118, score-0.216]

49 Because of this choice for the upper bound, a natural way to explore the tree is a depth ﬁrst search. [sent-119, score-0.217]

50 Indeed it is important to go to the leaves as often as possible in order to have a tight upper bound and thus perform aggressive pruning. [sent-120, score-0.261]

51 The choice of the lower bound and the branching strategy are presented next. [sent-121, score-0.355]

52 4 Lower bound We consider a simple lower bound based on the following observation. [sent-122, score-0.361]

53 The minimum of the objective function (1) is smaller when C ∗ = 0 than when C ∗ > 0. [sent-123, score-0.119]

54 But C ∗ = 0 corresponds to a standard SVM, ignoring the unlabeled data. [sent-124, score-0.333]

55 We can therefore compute a lower bound at a given node by optimizing a standard SVM on the labeled set associated with this node. [sent-125, score-0.429]

56 It is based on the dual objective function of SVMs. [sent-127, score-0.156]

57 Let D(α, yU ) be the dual objective function, where yU corresponds to the labels of the unlabeled points which have not been assigned a label yet, n n αi − D(α, yU ) = i=1 1 δij αi αj yi yj K(xi , xj ) + 2 i,j=1 2C . [sent-128, score-0.787]

58 (4) The dual feasibility is αi ≥ 0 and αi yi = 0. [sent-129, score-0.153]

59 Let Q(yU ) := D(α(yU ), yU ) and lb a lower bound on (or the value of) min Q(yU ), where the minimum is taken over all yU satisfying the balancing constraint (2). [sent-132, score-0.394]

60 Then lb is also a lower bound for the value of the objective function corresponding to that node. [sent-133, score-0.369]

61 The goal is thus to ﬁnd a choice for α(yU ) such that a lower bound on Q can be computed eﬃciently. [sent-134, score-0.204]

62 The choice corresponding to the lower bound presented above is the following. [sent-135, score-0.204]

63 Train an SVM on the labeled points, obtain the vector α and complete it with zeros for the unlabeled points. [sent-136, score-0.512]

64 Then Q(yU ) is the same for all the possible labelings of the unlabeled points and the lower bound is the SVM objective function on the labeled points. [sent-137, score-0.836]

65 2 To lower bound Q, one can use results from the quadratic zero-one programming literature [10] or solve a constrained eigenvalue problem [8]. [sent-140, score-0.278]

66 Finally, note that unless U yi = 0, the constraint αi yi = 0 will not be satisﬁed. [sent-141, score-0.304]

67 One remedy is to train the supervised SVM with the constraint αi yi = −γ U yi = γ(n − 2ru + L yi ) (because of (2)). [sent-142, score-0.42]

68 5 Branching At a given node, some unlabeled points have already been assigned a label. [sent-144, score-0.36]

69 Since our strategy is to reach a good solution as soon as possible (see last paragraph of section 3. [sent-146, score-0.107]

70 A simple possibility would be to branch on the unlabeled point which is the nearest from another labeled point using a reliable distance metric. [sent-148, score-0.852]

71 But we now present a more principled approach based on the analysis of the objective value. [sent-149, score-0.119]

72 We say that we are ”conﬁdent” about a particular label of an unlabeled point when assigning the opposite label results in a big increase of the objective value: this partial solution would then be unlikely to lead to the optimal one. [sent-150, score-0.763]

73 2 that a node is associated with a set L of currently labeled examples and a set U of unlabeled examples. [sent-153, score-0.558]

74 Let s(L) be the SVM objective function trained on the labeled set, 1 2 s(L) = min w +C max(0, 1 − yi (w · xi + b))2 . [sent-154, score-0.427]

75 (6) w,b 2 (xi ,yi )∈L As discussed in the previous section, the lower bound is s(L). [sent-155, score-0.204]

76 Now our branching strategy consists in selecting the following point in U , arg max s(L ∪ {x, y}) (7) x∈U, y∈±1 In other words, we want to ﬁnd the unlabeled point x∗ and its label y ∗ which would make the objective function increase as much as possible. [sent-156, score-0.785]

77 Then we branch on x∗ , but start exploring the branch with the most likely label −y ∗ . [sent-157, score-0.748]

78 This strategy has an intuitive link with the ”label propagation” idea [17]: an unlabeled point which is near from a labeled point is likely to be of the same label; otherwise, the objective function would be large. [sent-158, score-0.737]

79 Proposition 1 Consider training an SVM on a labeled set L with quadratic penalization of the errors (cf (6) or (4)). [sent-166, score-0.206]

80 It is based on the fact that s(L) = 2 ysv Ksv ysv and relies on the block matrix inverse formula. [sent-170, score-0.104]

81 At the beginning, the upper bound can either be set to +∞ or to a solution found by another algorithm. [sent-172, score-0.288]

82 Note that the SVM trainings are incremental: whenever we go down the tree, one point is added in the labeled set. [sent-173, score-0.23]

83 7 Experiments We consider here two datasets where other S3 VM implementations are unable to achieve satisfying test error rates. [sent-175, score-0.222]

84 This naturally raises the following questions: Is this weak per- Algorithm 1 Branch and bound for S3 VM(BB). [sent-176, score-0.157]

85 Function: (Y ∗ , v) ← S3 VM(Y, ub) % Recursive implementation Input: Y : a partly labeled vector (0 for unlabeled) ub: an upper bound on the optimal objective value. [sent-177, score-0.601]

86 Output: Y ∗ : optimal fully labeled vector v: corresponding objective function. [sent-178, score-0.272]

87 if max(0, Yi ) > ur OR max(0, −Yi ) < n − ur then return % Constraint (2) can not be satisﬁed end if −→ Do not explore this subtree v ← SVM(Y ) % Compute the SVM objective function on the labeled points. [sent-179, score-0.577]

88 1 Two moons The “two moons” dataset is now a standard benchmark for semi-supervised learning algorithms. [sent-182, score-0.193]

89 However, these methods have a constraint that the mean output on the unlabeled point should be equal to some constant. [sent-189, score-0.43]

90 Note that the test errors for other S3 VM implementations are likely to be improved by hyperparameter tuning, but they will still stay very high. [sent-192, score-0.204]

91 Table 1: Results on the two moons dataset (averaged over 100 random realizations) S3 VM cS3 VM CCCP SVMlight DA BB LapSVM Test error (%) 59. [sent-199, score-0.15]

92 We randomly selected 2 images per class to be in the labeled set and the rest being unlabeled. [sent-212, score-0.153]

93 8 Discussion and Conclusion We implemented and evaluated one strategy amongst many in the class of branch and bound methods to ﬁnd the globally optimal solution of S3 VMs. [sent-228, score-0.636]

94 This basic implementation can perhaps be made more eﬃcient by choosing better bounding and branching schemes. [sent-231, score-0.196]

95 Also, by ﬁxing the upper bound as the currently best objective 2 The reported test errors are somehow irrelevant and should not be used for ranking the diﬀerent algorithms. [sent-232, score-0.383]

96 It is conceivable that breadth-ﬁrst search is equally or more eﬀective in conjunction with alternative upper bounding schemes. [sent-235, score-0.158]

97 Finally, we note that a large family of well-tested branch and bound procedures from zeroone quadratic programming literature can be immediately applied to the S3 VM problem for the special case of squared loss. [sent-238, score-0.521]

98 For these reasons, we believe that this implementation does not scale to large datasets, but should instead be considered as a proof of concept: the S3 VM objective function is very well suited for semi-supervised learning and more eﬀort should be made on trying to eﬃciently ﬁnd good local minima. [sent-242, score-0.194]

99 Computational aspects of a branch and bound algorithm for quadratic zero-one programming. [sent-306, score-0.498]

100 Learning from labeled and unlabeled data with label propagation. [sent-347, score-0.575]

similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('vm', 0.515), ('unlabeled', 0.333), ('branch', 0.316), ('yu', 0.309), ('bound', 0.157), ('labeled', 0.153), ('implementations', 0.149), ('objective', 0.119), ('svm', 0.118), ('yi', 0.116), ('moons', 0.105), ('vms', 0.105), ('ub', 0.104), ('subtree', 0.1), ('branching', 0.096), ('tree', 0.09), ('label', 0.089), ('upper', 0.079), ('cccp', 0.079), ('ksv', 0.079), ('chapelle', 0.077), ('node', 0.072), ('constraint', 0.072), ('sv', 0.066), ('transductive', 0.066), ('leaf', 0.064), ('coil', 0.062), ('lapsvm', 0.062), ('svmlight', 0.062), ('ur', 0.062), ('realizations', 0.058), ('globally', 0.056), ('strategy', 0.055), ('sindhwani', 0.055), ('trainings', 0.052), ('ysv', 0.052), ('solution', 0.052), ('implementation', 0.051), ('bb', 0.05), ('bounding', 0.049), ('explore', 0.048), ('lower', 0.047), ('lb', 0.046), ('dataset', 0.045), ('concerning', 0.045), ('find', 0.043), ('max', 0.043), ('benchmark', 0.043), ('minima', 0.043), ('partly', 0.042), ('da', 0.041), ('satisfying', 0.041), ('labeling', 0.039), ('semisupervised', 0.039), ('xi', 0.039), ('dual', 0.037), ('excellent', 0.036), ('recursively', 0.036), ('keerthi', 0.035), ('yahoo', 0.035), ('signed', 0.035), ('continuation', 0.035), ('yj', 0.034), ('questions', 0.034), ('vapnik', 0.033), ('zien', 0.033), ('return', 0.033), ('datasets', 0.032), ('labels', 0.032), ('ective', 0.032), ('outperform', 0.032), ('partial', 0.032), ('support', 0.031), ('balancing', 0.031), ('annealing', 0.031), ('search', 0.03), ('non', 0.03), ('dent', 0.03), ('associate', 0.029), ('errors', 0.028), ('likely', 0.027), ('sake', 0.027), ('points', 0.027), ('zeros', 0.026), ('children', 0.026), ('incremental', 0.026), ('solve', 0.026), ('point', 0.025), ('chicago', 0.025), ('leaves', 0.025), ('quadratic', 0.025), ('believe', 0.024), ('say', 0.024), ('minimize', 0.023), ('programming', 0.023), ('yt', 0.023), ('hij', 0.023), ('sinz', 0.023), ('sx', 0.023), ('subregions', 0.023)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 1.0000008 48 nips-2006-Branch and Bound for Semi-Supervised Support Vector Machines

Author: Olivier Chapelle, Vikas Sindhwani, S. S. Keerthi

2 0.21566054 150 nips-2006-On Transductive Regression

Author: Corinna Cortes, Mehryar Mohri

Abstract: In many modern large-scale learning applications, the amount of unlabeled data far exceeds that of labeled data. A common instance of this problem is the transductive setting where the unlabeled test points are known to the learning algorithm. This paper presents a study of regression problems in that setting. It presents explicit VC-dimension error bounds for transductive regression that hold for all bounded loss functions and coincide with the tight classiﬁcation bounds of Vapnik when applied to classiﬁcation. It also presents a new transductive regression algorithm inspired by our bound that admits a primal and kernelized closedform solution and deals efﬁciently with large amounts of unlabeled data. The algorithm exploits the position of unlabeled points to locally estimate their labels and then uses a global optimization to ensure robust predictions. Our study also includes the results of experiments with several publicly available regression data sets with up to 20,000 unlabeled examples. The comparison with other transductive regression algorithms shows that it performs well and that it can scale to large data sets.

3 0.20450212 104 nips-2006-Large-Scale Sparsified Manifold Regularization

Author: Ivor W. Tsang, James T. Kwok

Abstract: Semi-supervised learning is more powerful than supervised learning by using both labeled and unlabeled data. In particular, the manifold regularization framework, together with kernel methods, leads to the Laplacian SVM (LapSVM) that has demonstrated state-of-the-art performance. However, the LapSVM solution typically involves kernel expansions of all the labeled and unlabeled examples, and is slow on testing. Moreover, existing semi-supervised learning methods, including the LapSVM, can only handle a small number of unlabeled examples. In this paper, we integrate manifold regularization with the core vector machine, which has been used for large-scale supervised and unsupervised learning. By using a sparsiﬁed manifold regularizer and formulating as a center-constrained minimum enclosing ball problem, the proposed method produces sparse solutions with low time and space complexities. Experimental results show that it is much faster than the LapSVM, and can handle a million unlabeled examples on a standard PC; while the LapSVM can only handle several thousand patterns. 1

4 0.16314575 136 nips-2006-Multi-Instance Multi-Label Learning with Application to Scene Classification

Author: Zhi-hua Zhou, Min-ling Zhang

Abstract: In this paper, we formalize multi-instance multi-label learning, where each training example is associated with not only multiple instances but also multiple class labels. Such a problem can occur in many real-world tasks, e.g. an image usually contains multiple patches each of which can be described by a feature vector, and the image can belong to multiple categories since its semantics can be recognized in different ways. We analyze the relationship between multi-instance multi-label learning and the learning frameworks of traditional supervised learning, multiinstance learning and multi-label learning. Then, we propose the M IML B OOST and M IML S VM algorithms which achieve good performance in an application to scene classiﬁcation. 1

5 0.13810469 118 nips-2006-Learning to Model Spatial Dependency: Semi-Supervised Discriminative Random Fields

Author: Chi-hoon Lee, Shaojun Wang, Feng Jiao, Dale Schuurmans, Russell Greiner

Abstract: We present a novel, semi-supervised approach to training discriminative random ﬁelds (DRFs) that efﬁciently exploits labeled and unlabeled training data to achieve improved accuracy in a variety of image processing tasks. We formulate DRF training as a form of MAP estimation that combines conditional loglikelihood on labeled data, given a data-dependent prior, with a conditional entropy regularizer deﬁned on unlabeled data. Although the training objective is no longer concave, we develop an efﬁcient local optimization procedure that produces classiﬁers that are more accurate than ones based on standard supervised DRF training. We then apply our semi-supervised approach to train DRFs to segment both synthetic and real data sets, and demonstrate signiﬁcant improvements over supervised DRFs in each case.

6 0.12830035 163 nips-2006-Prediction on a Graph with a Perceptron

7 0.11614759 186 nips-2006-Support Vector Machines on a Budget

8 0.096859001 183 nips-2006-Stochastic Relational Models for Discriminative Link Prediction

9 0.090858258 28 nips-2006-An Efficient Method for Gradient-Based Adaptation of Hyperparameters in SVM Models

10 0.089172639 172 nips-2006-Scalable Discriminative Learning for Natural Language Parsing and Translation

11 0.082559109 171 nips-2006-Sample Complexity of Policy Search with Known Dynamics

12 0.081915714 33 nips-2006-Analysis of Representations for Domain Adaptation

13 0.081264883 93 nips-2006-Hyperparameter Learning for Graph Based Semi-supervised Learning Algorithms

14 0.080444664 51 nips-2006-Clustering Under Prior Knowledge with Application to Image Segmentation

15 0.076553062 195 nips-2006-Training Conditional Random Fields for Maximum Labelwise Accuracy

16 0.073037095 68 nips-2006-Dirichlet-Enhanced Spam Filtering based on Biased Samples

17 0.070205458 152 nips-2006-Online Classification for Complex Problems Using Simultaneous Projections

18 0.06965135 193 nips-2006-Tighter PAC-Bayes Bounds

19 0.067834377 78 nips-2006-Fast Discriminative Visual Codebooks using Randomized Clustering Forests

20 0.067704625 74 nips-2006-Efficient Structure Learning of Markov Networks using $L 1$-Regularization

similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, -0.225), (1, 0.089), (2, -0.053), (3, 0.015), (4, -0.003), (5, 0.184), (6, -0.13), (7, 0.049), (8, 0.019), (9, -0.105), (10, -0.098), (11, -0.23), (12, -0.095), (13, 0.008), (14, 0.017), (15, 0.091), (16, 0.159), (17, -0.059), (18, -0.084), (19, -0.027), (20, -0.124), (21, -0.202), (22, 0.043), (23, 0.214), (24, 0.023), (25, 0.167), (26, -0.057), (27, 0.018), (28, -0.137), (29, 0.027), (30, -0.097), (31, 0.04), (32, 0.077), (33, 0.107), (34, -0.04), (35, 0.075), (36, -0.069), (37, 0.027), (38, -0.023), (39, 0.106), (40, -0.04), (41, 0.017), (42, 0.08), (43, -0.135), (44, -0.09), (45, -0.096), (46, 0.037), (47, -0.009), (48, -0.021), (49, -0.075)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.95573217 48 nips-2006-Branch and Bound for Semi-Supervised Support Vector Machines

Author: Olivier Chapelle, Vikas Sindhwani, S. S. Keerthi

2 0.77403837 150 nips-2006-On Transductive Regression

Author: Corinna Cortes, Mehryar Mohri

3 0.7710914 104 nips-2006-Large-Scale Sparsified Manifold Regularization

Author: Ivor W. Tsang, James T. Kwok

4 0.66011697 136 nips-2006-Multi-Instance Multi-Label Learning with Application to Scene Classification

Author: Zhi-hua Zhou, Min-ling Zhang

5 0.63004738 118 nips-2006-Learning to Model Spatial Dependency: Semi-Supervised Discriminative Random Fields

Author: Chi-hoon Lee, Shaojun Wang, Feng Jiao, Dale Schuurmans, Russell Greiner

6 0.57591945 93 nips-2006-Hyperparameter Learning for Graph Based Semi-supervised Learning Algorithms

7 0.41801706 68 nips-2006-Dirichlet-Enhanced Spam Filtering based on Biased Samples

8 0.38685709 28 nips-2006-An Efficient Method for Gradient-Based Adaptation of Hyperparameters in SVM Models

9 0.3783668 186 nips-2006-Support Vector Machines on a Budget

10 0.36377344 129 nips-2006-Map-Reduce for Machine Learning on Multicore

11 0.36077243 33 nips-2006-Analysis of Representations for Domain Adaptation

12 0.34873992 101 nips-2006-Isotonic Conditional Random Fields and Local Sentiment Flow

13 0.34812424 169 nips-2006-Relational Learning with Gaussian Processes

14 0.339908 172 nips-2006-Scalable Discriminative Learning for Natural Language Parsing and Translation

15 0.3304837 163 nips-2006-Prediction on a Graph with a Perceptron

16 0.32190388 193 nips-2006-Tighter PAC-Bayes Bounds

17 0.30166793 156 nips-2006-Ordinal Regression by Extended Binary Classification

18 0.2978467 183 nips-2006-Stochastic Relational Models for Discriminative Link Prediction

19 0.2942985 140 nips-2006-Multiple Instance Learning for Computer Aided Diagnosis

20 0.28787142 24 nips-2006-Aggregating Classification Accuracy across Time: Application to Single Trial EEG

similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(1, 0.124), (3, 0.064), (7, 0.101), (9, 0.041), (22, 0.07), (44, 0.097), (57, 0.069), (64, 0.019), (65, 0.059), (69, 0.034), (72, 0.224), (90, 0.013)]

similar papers list:

simIndex simValue paperId paperTitle

1 0.91312516 70 nips-2006-Doubly Stochastic Normalization for Spectral Clustering

Author: Ron Zass, Amnon Shashua

Abstract: In this paper we focus on the issue of normalization of the afﬁnity matrix in spectral clustering. We show that the difference between N-cuts and Ratio-cuts is in the error measure being used (relative-entropy versus L1 norm) in ﬁnding the closest doubly-stochastic matrix to the input afﬁnity matrix. We then develop a scheme for ﬁnding the optimal, under Frobenius norm, doubly-stochastic approximation using Von-Neumann’s successive projections lemma. The new normalization scheme is simple and efﬁcient and provides superior clustering performance over many of the standardized tests. 1

same-paper 2 0.84718144 48 nips-2006-Branch and Bound for Semi-Supervised Support Vector Machines

Author: Olivier Chapelle, Vikas Sindhwani, S. S. Keerthi

3 0.70398265 87 nips-2006-Graph Laplacian Regularization for Large-Scale Semidefinite Programming

Author: Kilian Q. Weinberger, Fei Sha, Qihui Zhu, Lawrence K. Saul

Abstract: In many areas of science and engineering, the problem arises how to discover low dimensional representations of high dimensional data. Recently, a number of researchers have converged on common solutions to this problem using methods from convex optimization. In particular, many results have been obtained by constructing semideﬁnite programs (SDPs) with low rank solutions. While the rank of matrix variables in SDPs cannot be directly constrained, it has been observed that low rank solutions emerge naturally by computing high variance or maximal trace solutions that respect local distance constraints. In this paper, we show how to solve very large problems of this type by a matrix factorization that leads to much smaller SDPs than those previously studied. The matrix factorization is derived by expanding the solution of the original problem in terms of the bottom eigenvectors of a graph Laplacian. The smaller SDPs obtained from this matrix factorization yield very good approximations to solutions of the original problem. Moreover, these approximations can be further reﬁned by conjugate gradient descent. We illustrate the approach on localization in large scale sensor networks, where optimizations involving tens of thousands of nodes can be solved in just a few minutes. 1

4 0.70148432 80 nips-2006-Fundamental Limitations of Spectral Clustering

Author: Boaz Nadler, Meirav Galun

Abstract: Spectral clustering methods are common graph-based approaches to clustering of data. Spectral clustering algorithms typically start from local information encoded in a weighted graph on the data and cluster according to the global eigenvectors of the corresponding (normalized) similarity matrix. One contribution of this paper is to present fundamental limitations of this general local to global approach. We show that based only on local information, the normalized cut functional is not a suitable measure for the quality of clustering. Further, even with a suitable similarity measure, we show that the ﬁrst few eigenvectors of such adjacency matrices cannot successfully cluster datasets that contain structures at different scales of size and density. Based on these ﬁndings, a second contribution of this paper is a novel diffusion based measure to evaluate the coherence of individual clusters. Our measure can be used in conjunction with any bottom-up graph-based clustering method, it is scale-free and can determine coherent clusters at all scales. We present both synthetic examples and real image segmentation problems where various spectral clustering algorithms fail. In contrast, using this coherence measure ﬁnds the expected clusters at all scales. Keywords: Clustering, kernels, learning theory. 1

5 0.69953316 20 nips-2006-Active learning for misspecified generalized linear models

Author: Francis R. Bach

Abstract: Active learning refers to algorithmic frameworks aimed at selecting training data points in order to reduce the number of required training data points and/or improve the generalization performance of a learning method. In this paper, we present an asymptotic analysis of active learning for generalized linear models. Our analysis holds under the common practical situation of model misspeciﬁcation, and is based on realistic assumptions regarding the nature of the sampling distributions, which are usually neither independent nor identical. We derive unbiased estimators of generalization performance, as well as estimators of expected reduction in generalization error after adding a new training data point, that allow us to optimize its sampling distribution through a convex optimization problem. Our analysis naturally leads to an algorithm for sequential active learning which is applicable for all tasks supported by generalized linear models (e.g., binary classiﬁcation, multi-class classiﬁcation, regression) and can be applied in non-linear settings through the use of Mercer kernels. 1

6 0.69862485 184 nips-2006-Stratification Learning: Detecting Mixed Density and Dimensionality in High Dimensional Point Clouds

7 0.693703 65 nips-2006-Denoising and Dimension Reduction in Feature Space

8 0.69236356 117 nips-2006-Learning on Graph with Laplacian Regularization

9 0.69223946 3 nips-2006-A Complexity-Distortion Approach to Joint Pattern Alignment

10 0.68938059 175 nips-2006-Simplifying Mixture Models through Function Approximation

11 0.68937552 106 nips-2006-Large Margin Hidden Markov Models for Automatic Speech Recognition

12 0.68631297 32 nips-2006-Analysis of Empirical Bayesian Methods for Neuroelectromagnetic Source Localization

13 0.68459082 35 nips-2006-Approximate inference using planar graph decomposition

14 0.68434763 128 nips-2006-Manifold Denoising

15 0.68317652 83 nips-2006-Generalized Maximum Margin Clustering and Unsupervised Kernel Learning

16 0.68211347 121 nips-2006-Learning to be Bayesian without Supervision

17 0.6820755 118 nips-2006-Learning to Model Spatial Dependency: Semi-Supervised Discriminative Random Fields

18 0.68195093 150 nips-2006-On Transductive Regression

19 0.68188751 119 nips-2006-Learning to Rank with Nonsmooth Cost Functions

20 0.68174011 127 nips-2006-MLLE: Modified Locally Linear Embedding Using Multiple Weights