nips nips2007 nips2007-76 knowledge-graph by maker-knowledge-mining

76 nips-2007-Efficient Convex Relaxation for Transductive Support Vector Machine

Source: pdf

Author: Zenglin Xu, Rong Jin, Jianke Zhu, Irwin King, Michael Lyu

Abstract: We consider the problem of Support Vector Machine transduction, which involves a combinatorial problem with exponential computational complexity in the number of unlabeled examples. Although several studies are devoted to Transductive SVM, they suffer either from the high computation complexity or from the solutions of local optimum. To address this problem, we propose solving Transductive SVM via a convex relaxation, which converts the NP-hard problem to a semi-deﬁnite programming. Compared with the other SDP relaxation for Transductive SVM, the proposed algorithm is computationally more efﬁcient with the number of free parameters reduced from O(n2 ) to O(n) where n is the number of examples. Empirical study with several benchmark data sets shows the promising performance of the proposed algorithm in comparison with other state-of-the-art implementations of Transductive SVM. 1

Reference: text

Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 hk Abstract We consider the problem of Support Vector Machine transduction, which involves a combinatorial problem with exponential computational complexity in the number of unlabeled examples. [sent-17, score-0.206]

2 Although several studies are devoted to Transductive SVM, they suffer either from the high computation complexity or from the solutions of local optimum. [sent-18, score-0.122]

3 To address this problem, we propose solving Transductive SVM via a convex relaxation, which converts the NP-hard problem to a semi-deﬁnite programming. [sent-19, score-0.121]

4 Compared with the other SDP relaxation for Transductive SVM, the proposed algorithm is computationally more efﬁcient with the number of free parameters reduced from O(n2 ) to O(n) where n is the number of examples. [sent-20, score-0.326]

5 Empirical study with several benchmark data sets shows the promising performance of the proposed algorithm in comparison with other state-of-the-art implementations of Transductive SVM. [sent-21, score-0.163]

6 An important semi-supervised learning paradigm is the Transductive Support Vector Machine (TSVM), which maximizes the margin in the presence of unlabeled data and keeps the boundary traversing through low density regions, while respecting labels in the input space. [sent-23, score-0.364]

7 Since TSVM requires solving a combinatorial optimization problem, extensive research efforts have been devoted to efﬁciently ﬁnding the approximate solution to TSVM. [sent-24, score-0.099]

8 It begins with minimizing an easy convex object function, and then gradually approximates the objective of TSVM with more complicated functions. [sent-29, score-0.15]

9 The solution of the simple function is used as the initialization for the solution to the complicated function. [sent-30, score-0.07]

10 Other iterative methods, such as deterministic annealing [11] and the concave-convex procedure (CCCP) method [6], are also employed to solve the optimization problem related to TSVM. [sent-31, score-0.075]

11 To address this problem, in [4], a branch- Time Comparison 2000 CTSVM RTSVM 1800 1600 Time (seconds) 1400 1200 1000 800 600 400 200 0 50 100 150 200 Number of Samples 250 300 Figure 1: Computation time of the proposed convex relaxation approach for TSVM (i. [sent-33, score-0.415]

12 , CTSVM) and the semi-deﬁnite relaxation approach for TSVM (i. [sent-35, score-0.252]

13 The Course data set is used, and the number of labeled examples is 20. [sent-38, score-0.179]

14 In [14], the authors approximate TSVM by a semi-deﬁnite programming problem, which leads to a relaxation solution to TSVM (noted as RTSVM), to avoid the solution of local optimum. [sent-40, score-0.36]

15 However, both approaches suffer from the high computational cost and can only be applied to small sized data sets. [sent-41, score-0.047]

16 To this end, we present the convex relaxation for Transductive SVM (CTSVM). [sent-42, score-0.373]

17 The key idea of our method is to approximate the non-convex optimization problem of TSVM by its dual problem. [sent-43, score-0.088]

18 The advantage of doing so is twofold: • Unlike the semi-deﬁnite relaxation [14] that approximates TSVM by dropping the rank constraint, the proposed approach approximates TSVM by its dual problem. [sent-44, score-0.43]

19 As the basic result of convex analysis, the conjugate of conjugate of any function f (x) is the convex envelope of f (x), and therefore provides a tighter convex relaxation for f (x) [7]. [sent-45, score-0.763]

20 Hence, the proposed approach provides a better convex relaxation than that in [14] for the optimization problem in TSVM. [sent-46, score-0.459]

21 • Compared to the semi-deﬁnite relaxation TSVM, the proposed algorithm involves fewer free parameters and therefore signiﬁcantly improves the efﬁciency by reducing the worstcase computational complexity from O(n6. [sent-47, score-0.404]

22 Figure 1 shows the running time of both the semi-deﬁnite relaxation of TSVM in [14] and the proposed convex relaxation for TSVM versus increasing number of unlabeled examples. [sent-50, score-0.817]

23 The data set used in this example is the Course data set (see the experiment section), and the number of labeled examples is 20. [sent-51, score-0.202]

24 We clearly see that the proposed convex relaxation approach is considerably more efﬁcient than the semi-deﬁnition approach. [sent-52, score-0.415]

25 Section 2 reviews the related work on the semideﬁnite relaxation for TSVM. [sent-54, score-0.252]

26 Section 3 presents the convex relaxation approach for Transductive SVM. [sent-55, score-0.4]

27 Section 4 presents the empirical studies that verify the effectiveness of the proposed relaxation for TSVM. [sent-56, score-0.357]

28 2 Related Work In this section, we review the key formulae for Transductive SVM, followed by the semi-deﬁnite programming relaxation for TSVM. [sent-58, score-0.312]

29 , xn ) denote the entire data set, including both the labeled examples and the unlabeled ones. [sent-62, score-0.329]

30 We assume that the ﬁrst l examples within X are labeled by y = (y1 , y2 , . [sent-63, score-0.156]

31 , yl ) where yi ∈ {−1, +1} represents the binary class label assigned to xi . [sent-66, score-0.214]

32 , yn ) ∈ {−1, +1}n the binary class labels predicted for all the data points in X . [sent-70, score-0.059]

33 The goal of TSVM is to estimate y by using both the labeled examples and the unlabeled ones. [sent-71, score-0.306]

34 , l, where C ≥ 0 is the trade-off parameter between the complexity of function w and the margin errors. [sent-80, score-0.164]

35 Evidently, the above problem is a non-convex optimization problem due to the product term y i wj in the constraint. [sent-82, score-0.044]

36 In order to approximate the above problem into a convex programming problem, we ﬁrst rewrite the above problem into the following form using the Lagrange Theorem: 1 min n (e + ν − δ + λy) D(y)K−1 D(y)(e + ν − δ + λy) + Cδ e (1) 2 ν,y∈{−1,+1} ,δ,λ s. [sent-83, score-0.218]

37 e is the n-dimensional column vector of all ones and K is the kernel matrix. [sent-89, score-0.049]

38 D(y) represents a diagonal matrix whose diagonal elements form the vector y. [sent-90, score-0.065]

39 Using the Schur complement, the above formulation can be further formulated as follows: min t (2) n y∈{−1,+1} ,t,ν,δ,λ yy ◦ K (e + ν − δ + λy) s. [sent-92, score-0.085]

40 e + ν − δ + λy t − 2Cδ e 0 ν ≥ 0, δ ≥ 0, yi = yi , i = 1, 2, . [sent-94, score-0.252]

41 To convert the above problem into a convex optimization problem, the key idea is to replace the quadratic term yy by a linear variable. [sent-98, score-0.213]

42 Based on the result that the set Sa = {M = yy |y ∈ {−1, +1}n } is equivalent to the set Sb = {M|Mi,i = 1, rank(M) = 1}, we can approximate the problem in (2) as follows: min t (3) M,t,ν,δ,λ s. [sent-99, score-0.085]

43 Note that the key differences between (2) and (3) are (a) the rank constraint rank(M) = 1 is removed, and (b) the variable λ is set to be zero, which is equivalent to setting b = 0. [sent-105, score-0.064]

44 As revealed by the previous studies [14, 1], the SDP programming problem resulting from the approximation is computationally expensive. [sent-107, score-0.096]

45 More speciﬁcally, there are O(n2 ) parameters in the SDP cone and O(n) linear inequality constraints, which implies a worst-case computational complexity of O(n6. [sent-108, score-0.09]

46 To avoid the high computational complexity, we present a different approach for relaxing TSVM into a convex problem. [sent-110, score-0.152]

47 Compared to the SDP relaxation approach, it is advantageous in that (1) it produces the best convex approximation for TSVM, and (2) it is computationally more efﬁcient than the previous SDP relaxation. [sent-111, score-0.373]

48 3 Relaxed Transductive Support Vector Machine In this section, we follow the work of generalized maximum margin clustering [13] by ﬁrst studying the case of hard margin, and then extending it to the case of soft margin. [sent-112, score-0.167]

49 1 Hard Margin TSVM In the hard margin case, SVM does not penalize the classiﬁcation error, which corresponds to δ = 0 in (1). [sent-114, score-0.133]

50 Instead of employing the SDP relaxation as in [14], we follow the work in [13] and introduce a variable z = D(y)(e + ν) = y ◦ (e + ν). [sent-124, score-0.252]

51 Given that ν ≥ 0, the constraints in (4) can be written 2 as yi zi ≥ 1 for the labeled examples, and zi ≥ 1 for all the unlabeled examples. [sent-125, score-0.501]

52 Using this new notation, the optimization problem in (4) can be rewritten as follows: 1 (z + λe) K−1 (z + λe) (5) min z,λ 2 s. [sent-129, score-0.081]

53 One problem with Transductive SVMs is that it is possible to classify all the unlabeled data to one of the classes with a very large margin due to the high dimension and few labeled data. [sent-138, score-0.441]

54 To solve this problem, we introduce the following balance constraint to ensure that no class takes all the unlabeled examples: − ≤ 1 l l zi − i=1 1 n−l n zi ≤ , (6) i=l+1 where ≥ 0 is a constant. [sent-140, score-0.316]

55 Through the above constraint, we aim to ensure that the difference between the labeled data and the unlabeled data in their class assignment is small. [sent-141, score-0.309]

56 We deﬁne the Lagrangian of the minimization problem (7) as follows: l min max F(w, γ) w γ n = w P K−1 Pw + 2 γi (1 − wi ) γi (1 − yi wi ) + i=1 i=l+1 +α(c w − ) + β(−c w − ), where γi ≥ 0 for i = 1, . [sent-174, score-0.301]

57 It can be derived from the duality that minw maxγ F(w, γ) = maxγ minw F(w, γ). [sent-178, score-0.064]

58 2 Thus, the dual form of the problem becomes: max γ 1 −1 L(γ) = − (γ ◦ a − (α − β)c) [A − D(b ◦ γ)] (γ ◦ a − (α − β)c) + 4 n γi − (α + β), i=1 We import a variable t, so that 1 − (γ ◦ a − (α − β)c) [A − D(b ◦ γ)]−1 (γ ◦ a − (α − β)c) ≥ −t. [sent-182, score-0.044]

59 4 According to the Schur Complement, we obtain a semi-deﬁnite programming cone, from which the optimization problem (9) can be formulated. [sent-183, score-0.104]

60 The problem in (9) is a convex optimization problem, more speciﬁcally, a semi-deﬁnite programming problem, and can be efﬁciently solved by the interior-point method [10] implemented in some optimization packages, such as SeDuMi [12]. [sent-185, score-0.269]

61 Besides, our relaxation algorithm has O(n) parameters in the SDP cone and O(n) linear equality constraints, which involves a worst-case computational complexity of O(n4. [sent-186, score-0.367]

62 However, in the previous relaxation algorithms [1, 14], there are approximately O(n2 ) parameters in the SDP cone, which involve a worst-case computational complexity in the scale of O(n6. [sent-188, score-0.283]

63 Therefore, our proposed convex relaxation algorithm is more efﬁcient. [sent-190, score-0.415]

64 In addition, as analyzed in Section 2, the approximation in [1, 14] drops the rank constraint of the matrix y y, which does not lead to a tight approximation. [sent-191, score-0.064]

65 On the other hand, our prediction function f ∗ implements the conjugate of conjugate of the prediction function f (x), which is the convex envelope of f (x) [7]. [sent-192, score-0.286]

66 Thus, our proposed convex approximation method provides a tighter approximation than the previous method. [sent-193, score-0.2]

67 It is interesting to discuss the connection between the solution of the proposed algorithm and that of harmonic functions. [sent-195, score-0.116]

68 (10) 2 It can be further derived as follows: −1 n z= l γi KIi n In − i=l+1 γi yi K(xi , ·) , (11) i=1 where Ii is deﬁned as an n × n matrix with all elements being zero except the i-th diagonal eln ement which is 1, and K(xi , ·) is the i-th column of K. [sent-198, score-0.148]

69 Similar to the solution of the harmonic function, we ﬁrst propagate the class labels from the labeled examples to the unlabeled one by term l n i −1 . [sent-199, score-0.416]

70 i=1 γi yi K(xi , ·), and then adjust the prediction labels by the factor In − i=l+1 γi KIn The key difference in our solution is that (1) different weights (i. [sent-200, score-0.213]

71 , γi ) are assigned to the labeled examples, and (2) the adjustment factor is different to that in the harmonic function [16]. [sent-202, score-0.163]

72 2 Soft Margin TSVM We extend TSVM to the case of soft margin by considering the following problem: min ν,y,δ,λ s. [sent-204, score-0.204]

73 1 (e + ν − δ + λy) D(y)K−1 D(y)(e + ν − δ + λy) + C 2 l n 2 δi + C u i=1 2 δi i=l+1 ν ≥ 0, δ ≥ 0, yi = yi , 1 ≤ i ≤ l, 2 yi = 1, l + 1 ≤ i ≤ n, where δi is related to the margin error. [sent-206, score-0.511]

74 Note that we distinguish the labeled examples from the unlabeled examples by introducing different penalty constants for margin errors, C for labeled examples and Cu for unlabeled examples. [sent-207, score-0.788]

75 , n, α ≥ 0, β ≥ 0, which is also a semi-deﬁnite programming problem and can be solved similarly. [sent-214, score-0.06]

76 4 Experiments In this section, we report empirical study of the proposed method on several benchmark data sets. [sent-217, score-0.102]

77 1 Data Sets Description To make evaluations comprehensive, we have collected four UCI data sets and three text data sets as our experimental testbeds. [sent-219, score-0.133]

78 The UCI data sets include Iono, sonar, Banana, and Breast, which are widely used in data classiﬁcation. [sent-220, score-0.077]

79 The WinMac data set consists of the classes, mswindows and mac, of the Newsgroup20 data set. [sent-221, score-0.046]

80 The IBM data set contains the classes, IBM and non-IBM, of the Newsgroup20 data set. [sent-222, score-0.046]

81 The course data set is made of the course pages and non-course pages of the WebKb corpus. [sent-223, score-0.077]

82 For each text data set, we randomly sample the data with the sample size of 60, 300 and 1000, respectively. [sent-224, score-0.046]

83 Table 1 describes the information of these data sets, where d represents the data dimensionality, l means the number of labeled data points, and n denotes the total number of examples. [sent-226, score-0.203]

84 Table 1: Data sets used in the experiments, where d represents the data dimensionality, l means the number of labeled data points, and n denotes the total number of examples. [sent-227, score-0.211]

85 5 ), which is difﬁcult to process data sets with hundreds of examples. [sent-232, score-0.054]

86 In each trial, the training set contains each class of data, and the remaining data are then used as the unlabeled (test) data. [sent-238, score-0.173]

87 Moreover, the RBF kernel is used for “Iono”, “Sonar” and “Banana”, and the linear kernel is used for the other data sets. [sent-239, score-0.121]

88 This is because the linear kernel performs better than the RBF kernel on these data sets. [sent-240, score-0.121]

89 The kernel width of RBF kernel is chosen by 5-cross validation on the labeled data. [sent-241, score-0.211]

90 The margin parameter C is tuned by using the labeled data in all algorithms. [sent-242, score-0.269]

91 Due to the small number of labeled examples, for CTSVM and CCCP, the margin parameter for unlabeled data, Cu , is set equal to C . [sent-243, score-0.396]

92 3 Experimental Results Table 2: The classiﬁcation performance of Transductive SVMs on benchmark data sets. [sent-246, score-0.06]

93 It can be observed that our proposed algorithm performs signiﬁcantly better than the standard SVM across all the data sets. [sent-366, score-0.065]

94 On the remaining data sets, comparable results have been obtained for our proposed algorithm. [sent-376, score-0.065]

95 From above, the empirical evaluations indicate that our proposed CTSVM method achieves promising classiﬁcation results comparing with the state-of-the-art methods. [sent-377, score-0.097]

96 5 Conclusion and Future Work This paper presents a novel method for Transductive SVM by relaxing the unknown labels to the continuous variables. [sent-378, score-0.094]

97 In contrast to the previous relaxation method which involves O(n 2 ) free parameters in the semi-deﬁnite matrix, our method takes the advantages of reducing the number of free parameters to O(n), and can solve the optimization problem more efﬁciently. [sent-379, score-0.385]

98 In addition, the proposed approach provides a tighter convex relaxation for the optimization problem in TSVM. [sent-380, score-0.496]

99 Empirical studies on benchmark data sets demonstrate that the proposed method is more efﬁcient than the previous semi-deﬁnite relaxation method and achieves promising classiﬁcation results comparing to the state-of-the-art methods. [sent-381, score-0.451]

100 Interior point polynomial methods in convex programming: Theory and applications. [sent-459, score-0.121]

similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('tsvm', 0.702), ('relaxation', 0.252), ('transductive', 0.237), ('ctsvm', 0.226), ('unlabeled', 0.15), ('sdp', 0.139), ('margin', 0.133), ('yi', 0.126), ('convex', 0.121), ('labeled', 0.113), ('kong', 0.109), ('svm', 0.102), ('cccp', 0.1), ('iono', 0.1), ('sonar', 0.1), ('hong', 0.1), ('banana', 0.084), ('rtsvm', 0.075), ('wi', 0.069), ('yl', 0.067), ('chapelle', 0.064), ('programming', 0.06), ('cone', 0.059), ('zi', 0.056), ('cu', 0.053), ('shatin', 0.05), ('breast', 0.05), ('harmonic', 0.05), ('kernel', 0.049), ('yy', 0.048), ('nite', 0.045), ('optimization', 0.044), ('dual', 0.044), ('pw', 0.044), ('examples', 0.043), ('rn', 0.042), ('proposed', 0.042), ('sedumi', 0.04), ('schur', 0.04), ('rbf', 0.038), ('benchmark', 0.037), ('conjugate', 0.037), ('tighter', 0.037), ('envelope', 0.037), ('min', 0.037), ('labels', 0.036), ('studies', 0.036), ('classi', 0.035), ('lkopf', 0.034), ('rank', 0.034), ('soft', 0.034), ('sindhwani', 0.033), ('sch', 0.033), ('free', 0.032), ('minw', 0.032), ('sets', 0.031), ('complexity', 0.031), ('chinese', 0.031), ('devoted', 0.031), ('relaxing', 0.031), ('annealing', 0.031), ('support', 0.03), ('constraint', 0.03), ('hoffman', 0.03), ('promising', 0.03), ('approximates', 0.029), ('complement', 0.028), ('presents', 0.027), ('course', 0.027), ('prediction', 0.027), ('usa', 0.026), ('ibm', 0.025), ('remark', 0.025), ('xu', 0.025), ('evaluations', 0.025), ('zhu', 0.025), ('involves', 0.025), ('solution', 0.024), ('platt', 0.024), ('balance', 0.024), ('suffer', 0.024), ('ma', 0.023), ('international', 0.023), ('data', 0.023), ('diagonal', 0.022), ('initialization', 0.022), ('classify', 0.022), ('uci', 0.022), ('worstcase', 0.022), ('mation', 0.022), ('sinz', 0.022), ('china', 0.022), ('keerthi', 0.022), ('lyu', 0.022), ('respecting', 0.022), ('rong', 0.022), ('relaxed', 0.022), ('svms', 0.022), ('proceedings', 0.021), ('represents', 0.021)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 1.0000001 76 nips-2007-Efficient Convex Relaxation for Transductive Support Vector Machine

Author: Zenglin Xu, Rong Jin, Jianke Zhu, Irwin King, Michael Lyu

2 0.13018067 112 nips-2007-Learning Monotonic Transformations for Classification

Author: Andrew Howard, Tony Jebara

Abstract: A discriminative method is proposed for learning monotonic transformations of the training data while jointly estimating a large-margin classiﬁer. In many domains such as document classiﬁcation, image histogram classiﬁcation and gene microarray experiments, ﬁxed monotonic transformations can be useful as a preprocessing step. However, most classiﬁers only explore these transformations through manual trial and error or via prior domain knowledge. The proposed method learns monotonic transformations automatically while training a large-margin classiﬁer without any prior knowledge of the domain. A monotonic piecewise linear function is learned which transforms data for subsequent processing by a linear hyperplane classiﬁer. Two algorithmic implementations of the method are formalized. The ﬁrst solves a convergent alternating sequence of quadratic and linear programs until it obtains a locally optimal solution. An improved algorithm is then derived using a convex semideﬁnite relaxation that overcomes initialization issues in the greedy optimization problem. The eﬀectiveness of these learned transformations on synthetic problems, text data and image data is demonstrated. 1

3 0.11402169 65 nips-2007-DIFFRAC: a discriminative and flexible framework for clustering

Author: Francis R. Bach, Zaïd Harchaoui

Abstract: We present a novel linear clustering framework (D IFFRAC) which relies on a linear discriminative cost function and a convex relaxation of a combinatorial optimization problem. The large convex optimization problem is solved through a sequence of lower dimensional singular value decompositions. This framework has several attractive properties: (1) although apparently similar to K-means, it exhibits superior clustering performance than K-means, in particular in terms of robustness to noise. (2) It can be readily extended to non linear clustering if the discriminative cost function is based on positive deﬁnite kernels, and can then be seen as an alternative to spectral clustering. (3) Prior information on the partition is easily incorporated, leading to state-of-the-art performance for semi-supervised learning, for clustering or classiﬁcation. We present empirical evaluations of our algorithms on synthetic and real medium-scale datasets.

4 0.1135979 69 nips-2007-Discriminative Batch Mode Active Learning

Author: Yuhong Guo, Dale Schuurmans

Abstract: Active learning sequentially selects unlabeled instances to label with the goal of reducing the effort needed to learn a good classiﬁer. Most previous studies in active learning have focused on selecting one unlabeled instance to label at a time while retraining in each iteration. Recently a few batch mode active learning approaches have been proposed that select a set of most informative unlabeled instances in each iteration under the guidance of heuristic scores. In this paper, we propose a discriminative batch mode active learning approach that formulates the instance selection task as a continuous optimization problem over auxiliary instance selection variables. The optimization is formulated to maximize the discriminative classiﬁcation performance of the target classiﬁer, while also taking the unlabeled data into account. Although the objective is not convex, we can manipulate a quasi-Newton method to obtain a good local solution. Our empirical studies on UCI datasets show that the proposed active learning is more effective than current state-of-the art batch mode active learning algorithms. 1

5 0.11069868 186 nips-2007-Statistical Analysis of Semi-Supervised Regression

Author: Larry Wasserman, John D. Lafferty

Abstract: Semi-supervised methods use unlabeled data in addition to labeled data to construct predictors. While existing semi-supervised methods have shown some promising empirical performance, their development has been based largely based on heuristics. In this paper we study semi-supervised learning from the viewpoint of minimax theory. Our ﬁrst result shows that some common methods based on regularization using graph Laplacians do not lead to faster minimax rates of convergence. Thus, the estimators that use the unlabeled data do not have smaller risk than the estimators that use only labeled data. We then develop several new approaches that provably lead to improved performance. The statistical tools of minimax analysis are thus used to offer some new perspective on the problem of semi-supervised learning. 1

6 0.11063784 63 nips-2007-Convex Relaxations of Latent Variable Training

7 0.10574103 166 nips-2007-Regularized Boost for Semi-Supervised Learning

8 0.10195161 175 nips-2007-Semi-Supervised Multitask Learning

9 0.10080915 10 nips-2007-A Randomized Algorithm for Large Scale Support Vector Learning

10 0.099459827 23 nips-2007-An Analysis of Convex Relaxations for MAP Estimation

11 0.098343112 11 nips-2007-A Risk Minimization Principle for a Class of Parzen Estimators

12 0.09578716 201 nips-2007-The Value of Labeled and Unlabeled Examples when the Model is Imperfect

13 0.088617802 190 nips-2007-Support Vector Machine Classification with Indefinite Kernels

14 0.087285317 62 nips-2007-Convex Learning with Invariances

15 0.071402512 32 nips-2007-Bayesian Co-Training

16 0.069366306 88 nips-2007-Fast and Scalable Training of Semi-Supervised CRFs with Application to Activity Recognition

17 0.069141164 212 nips-2007-Using Deep Belief Nets to Learn Covariance Kernels for Gaussian Processes

18 0.068090484 80 nips-2007-Ensemble Clustering using Semidefinite Programming

19 0.066869907 134 nips-2007-Multi-Task Learning via Conic Programming

20 0.066422939 38 nips-2007-Boosting Algorithms for Maximizing the Soft Margin

similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, -0.184), (1, 0.031), (2, -0.156), (3, 0.146), (4, -0.014), (5, -0.027), (6, 0.104), (7, -0.001), (8, 0.033), (9, 0.127), (10, 0.104), (11, -0.126), (12, -0.095), (13, -0.026), (14, 0.036), (15, 0.093), (16, -0.069), (17, 0.045), (18, 0.182), (19, 0.041), (20, -0.064), (21, -0.085), (22, 0.001), (23, -0.016), (24, 0.025), (25, 0.044), (26, 0.002), (27, 0.08), (28, 0.054), (29, -0.007), (30, 0.065), (31, -0.003), (32, 0.073), (33, 0.014), (34, -0.054), (35, -0.06), (36, 0.019), (37, 0.001), (38, -0.035), (39, -0.017), (40, -0.093), (41, -0.016), (42, -0.096), (43, -0.021), (44, 0.054), (45, -0.026), (46, -0.003), (47, 0.066), (48, 0.115), (49, 0.008)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.94657636 76 nips-2007-Efficient Convex Relaxation for Transductive Support Vector Machine

Author: Zenglin Xu, Rong Jin, Jianke Zhu, Irwin King, Michael Lyu

2 0.7393828 112 nips-2007-Learning Monotonic Transformations for Classification

Author: Andrew Howard, Tony Jebara

3 0.59959716 69 nips-2007-Discriminative Batch Mode Active Learning

Author: Yuhong Guo, Dale Schuurmans

4 0.56872672 23 nips-2007-An Analysis of Convex Relaxations for MAP Estimation

Author: Pawan Mudigonda, Vladimir Kolmogorov, Philip Torr

Abstract: The problem of obtaining the maximum a posteriori estimate of a general discrete random ﬁeld (i.e. a random ﬁeld deﬁned using a ﬁnite and discrete set of labels) is known to be NP-hard. However, due to its central importance in many applications, several approximate algorithms have been proposed in the literature. In this paper, we present an analysis of three such algorithms based on convex relaxations: (i) LP - S: the linear programming (LP) relaxation proposed by Schlesinger [20] for a special case and independently in [4, 12, 23] for the general case; (ii) QP - RL: the quadratic programming (QP) relaxation by Ravikumar and Lafferty [18]; and (iii) SOCP - MS: the second order cone programming (SOCP) relaxation ﬁrst proposed by Muramatsu and Suzuki [16] for two label problems and later extended in [14] for a general label set. We show that the SOCP - MS and the QP - RL relaxations are equivalent. Furthermore, we prove that despite the ﬂexibility in the form of the constraints/objective function offered by QP and SOCP, the LP - S relaxation strictly dominates (i.e. provides a better approximation than) QP - RL and SOCP - MS. We generalize these results by deﬁning a large class of SOCP (and equivalent QP) relaxations which is dominated by the LP - S relaxation. Based on these results we propose some novel SOCP relaxations which strictly dominate the previous approaches.

5 0.56017596 201 nips-2007-The Value of Labeled and Unlabeled Examples when the Model is Imperfect

Author: Kaushik Sinha, Mikhail Belkin

Abstract: Semi-supervised learning, i.e. learning from both labeled and unlabeled data has received signiﬁcant attention in the machine learning literature in recent years. Still our understanding of the theoretical foundations of the usefulness of unlabeled data remains somewhat limited. The simplest and the best understood situation is when the data is described by an identiﬁable mixture model, and where each class comes from a pure component. This natural setup and its implications ware analyzed in [11, 5]. One important result was that in certain regimes, labeled data becomes exponentially more valuable than unlabeled data. However, in most realistic situations, one would not expect that the data comes from a parametric mixture distribution with identiﬁable components. There have been recent efforts to analyze the non-parametric situation, for example, “cluster” and “manifold” assumptions have been suggested as a basis for analysis. Still, a satisfactory and fairly complete theoretical understanding of the nonparametric problem, similar to that in [11, 5] has not yet been developed. In this paper we investigate an intermediate situation, when the data comes from a probability distribution, which can be modeled, but not perfectly, by an identiﬁable mixture distribution. This seems applicable to many situation, when, for example, a mixture of Gaussians is used to model the data. the contribution of this paper is an analysis of the role of labeled and unlabeled data depending on the amount of imperfection in the model.

6 0.55827326 190 nips-2007-Support Vector Machine Classification with Indefinite Kernels

7 0.55606383 186 nips-2007-Statistical Analysis of Semi-Supervised Regression

8 0.5469178 62 nips-2007-Convex Learning with Invariances

9 0.541188 65 nips-2007-DIFFRAC: a discriminative and flexible framework for clustering

10 0.54067564 63 nips-2007-Convex Relaxations of Latent Variable Training

11 0.53096473 24 nips-2007-An Analysis of Inference with the Universum

12 0.52645391 88 nips-2007-Fast and Scalable Training of Semi-Supervised CRFs with Application to Activity Recognition

13 0.49955925 10 nips-2007-A Randomized Algorithm for Large Scale Support Vector Learning

14 0.49955252 166 nips-2007-Regularized Boost for Semi-Supervised Learning

15 0.45221603 152 nips-2007-Parallelizing Support Vector Machines on Distributed Computers

16 0.45129699 11 nips-2007-A Risk Minimization Principle for a Class of Parzen Estimators

17 0.44787076 175 nips-2007-Semi-Supervised Multitask Learning

18 0.43146262 134 nips-2007-Multi-Task Learning via Conic Programming

19 0.42319372 45 nips-2007-Classification via Minimum Incremental Coding Length (MICL)

20 0.41807139 32 nips-2007-Bayesian Co-Training

similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(5, 0.033), (11, 0.011), (13, 0.075), (16, 0.027), (19, 0.025), (21, 0.076), (31, 0.052), (34, 0.021), (35, 0.028), (47, 0.077), (49, 0.016), (59, 0.011), (83, 0.124), (85, 0.049), (87, 0.017), (90, 0.06), (98, 0.218)]

similar papers list:

simIndex simValue paperId paperTitle

1 0.80523181 105 nips-2007-Infinite State Bayes-Nets for Structured Domains

Author: Max Welling, Ian Porteous, Evgeniy Bart

Abstract: A general modeling framework is proposed that uniﬁes nonparametric-Bayesian models, topic-models and Bayesian networks. This class of inﬁnite state Bayes nets (ISBN) can be viewed as directed networks of ‘hierarchical Dirichlet processes’ (HDPs) where the domain of the variables can be structured (e.g. words in documents or features in images). We show that collapsed Gibbs sampling can be done efﬁciently in these models by leveraging the structure of the Bayes net and using the forward-ﬁltering-backward-sampling algorithm for junction trees. Existing models, such as nested-DP, Pachinko allocation, mixed membership stochastic block models as well as a number of new models are described as ISBNs. Two experiments have been performed to illustrate these ideas. 1

2 0.77476048 82 nips-2007-Estimating divergence functionals and the likelihood ratio by penalized convex risk minimization

Author: Xuanlong Nguyen, Martin J. Wainwright, Michael I. Jordan

Abstract: We develop and analyze an algorithm for nonparametric estimation of divergence functionals and the density ratio of two probability distributions. Our method is based on a variational characterization of f -divergences, which turns the estimation into a penalized convex risk minimization problem. We present a derivation of our kernel-based estimation algorithm and an analysis of convergence rates for the estimator. Our simulation results demonstrate the convergence behavior of the method, which compares favorably with existing methods in the literature. 1

same-paper 3 0.76400572 76 nips-2007-Efficient Convex Relaxation for Transductive Support Vector Machine

Author: Zenglin Xu, Rong Jin, Jianke Zhu, Irwin King, Michael Lyu

4 0.64666837 63 nips-2007-Convex Relaxations of Latent Variable Training

Author: Yuhong Guo, Dale Schuurmans

Abstract: We investigate a new, convex relaxation of an expectation-maximization (EM) variant that approximates a standard objective while eliminating local minima. First, a cautionary result is presented, showing that any convex relaxation of EM over hidden variables must give trivial results if any dependence on the missing values is retained. Although this appears to be a strong negative outcome, we then demonstrate how the problem can be bypassed by using equivalence relations instead of value assignments over hidden variables. In particular, we develop new algorithms for estimating exponential conditional models that only require equivalence relation information over the variable values. This reformulation leads to an exact expression for EM variants in a wide range of problems. We then develop a semideﬁnite relaxation that yields global training by eliminating local minima. 1

5 0.64315516 94 nips-2007-Gaussian Process Models for Link Analysis and Transfer Learning

Author: Kai Yu, Wei Chu

Abstract: This paper aims to model relational data on edges of networks. We describe appropriate Gaussian Processes (GPs) for directed, undirected, and bipartite networks. The inter-dependencies of edges can be effectively modeled by adapting the GP hyper-parameters. The framework suggests an intimate connection between link prediction and transfer learning, which were traditionally two separate research topics. We develop an efﬁcient learning algorithm that can handle a large number of observations. The experimental results on several real-world data sets verify superior learning capacity. 1

6 0.64064288 69 nips-2007-Discriminative Batch Mode Active Learning

7 0.63867354 84 nips-2007-Expectation Maximization and Posterior Constraints

8 0.63844514 24 nips-2007-An Analysis of Inference with the Universum

9 0.63798875 86 nips-2007-Exponential Family Predictive Representations of State

10 0.63581002 177 nips-2007-Simplified Rules and Theoretical Analysis for Information Bottleneck Optimization and PCA with Spiking Neurons

11 0.63438511 18 nips-2007-A probabilistic model for generating realistic lip movements from speech

12 0.63438255 97 nips-2007-Hidden Common Cause Relations in Relational Learning

13 0.63374746 79 nips-2007-Efficient multiple hyperparameter learning for log-linear models

14 0.63351625 134 nips-2007-Multi-Task Learning via Conic Programming

15 0.63036466 34 nips-2007-Bayesian Policy Learning with Trans-Dimensional MCMC

16 0.63016266 138 nips-2007-Near-Maximum Entropy Models for Binary Neural Representations of Natural Images

17 0.6291123 156 nips-2007-Predictive Matrix-Variate t Models

18 0.62763649 102 nips-2007-Incremental Natural Actor-Critic Algorithms

19 0.62717134 88 nips-2007-Fast and Scalable Training of Semi-Supervised CRFs with Application to Activity Recognition

20 0.62670696 185 nips-2007-Stable Dual Dynamic Programming