nips nips2007 nips2007-76 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Zenglin Xu, Rong Jin, Jianke Zhu, Irwin King, Michael Lyu
Abstract: We consider the problem of Support Vector Machine transduction, which involves a combinatorial problem with exponential computational complexity in the number of unlabeled examples. Although several studies are devoted to Transductive SVM, they suffer either from the high computation complexity or from the solutions of local optimum. To address this problem, we propose solving Transductive SVM via a convex relaxation, which converts the NP-hard problem to a semi-definite programming. Compared with the other SDP relaxation for Transductive SVM, the proposed algorithm is computationally more efficient with the number of free parameters reduced from O(n2 ) to O(n) where n is the number of examples. Empirical study with several benchmark data sets shows the promising performance of the proposed algorithm in comparison with other state-of-the-art implementations of Transductive SVM. 1
Reference: text
sentIndex sentText sentNum sentScore
1 hk Abstract We consider the problem of Support Vector Machine transduction, which involves a combinatorial problem with exponential computational complexity in the number of unlabeled examples. [sent-17, score-0.206]
2 Although several studies are devoted to Transductive SVM, they suffer either from the high computation complexity or from the solutions of local optimum. [sent-18, score-0.122]
3 To address this problem, we propose solving Transductive SVM via a convex relaxation, which converts the NP-hard problem to a semi-definite programming. [sent-19, score-0.121]
4 Compared with the other SDP relaxation for Transductive SVM, the proposed algorithm is computationally more efficient with the number of free parameters reduced from O(n2 ) to O(n) where n is the number of examples. [sent-20, score-0.326]
5 Empirical study with several benchmark data sets shows the promising performance of the proposed algorithm in comparison with other state-of-the-art implementations of Transductive SVM. [sent-21, score-0.163]
6 An important semi-supervised learning paradigm is the Transductive Support Vector Machine (TSVM), which maximizes the margin in the presence of unlabeled data and keeps the boundary traversing through low density regions, while respecting labels in the input space. [sent-23, score-0.364]
7 Since TSVM requires solving a combinatorial optimization problem, extensive research efforts have been devoted to efficiently finding the approximate solution to TSVM. [sent-24, score-0.099]
8 It begins with minimizing an easy convex object function, and then gradually approximates the objective of TSVM with more complicated functions. [sent-29, score-0.15]
9 The solution of the simple function is used as the initialization for the solution to the complicated function. [sent-30, score-0.07]
10 Other iterative methods, such as deterministic annealing [11] and the concave-convex procedure (CCCP) method [6], are also employed to solve the optimization problem related to TSVM. [sent-31, score-0.075]
11 To address this problem, in [4], a branch- Time Comparison 2000 CTSVM RTSVM 1800 1600 Time (seconds) 1400 1200 1000 800 600 400 200 0 50 100 150 200 Number of Samples 250 300 Figure 1: Computation time of the proposed convex relaxation approach for TSVM (i. [sent-33, score-0.415]
12 , CTSVM) and the semi-definite relaxation approach for TSVM (i. [sent-35, score-0.252]
13 The Course data set is used, and the number of labeled examples is 20. [sent-38, score-0.179]
14 In [14], the authors approximate TSVM by a semi-definite programming problem, which leads to a relaxation solution to TSVM (noted as RTSVM), to avoid the solution of local optimum. [sent-40, score-0.36]
15 However, both approaches suffer from the high computational cost and can only be applied to small sized data sets. [sent-41, score-0.047]
16 To this end, we present the convex relaxation for Transductive SVM (CTSVM). [sent-42, score-0.373]
17 The key idea of our method is to approximate the non-convex optimization problem of TSVM by its dual problem. [sent-43, score-0.088]
18 The advantage of doing so is twofold: • Unlike the semi-definite relaxation [14] that approximates TSVM by dropping the rank constraint, the proposed approach approximates TSVM by its dual problem. [sent-44, score-0.43]
19 As the basic result of convex analysis, the conjugate of conjugate of any function f (x) is the convex envelope of f (x), and therefore provides a tighter convex relaxation for f (x) [7]. [sent-45, score-0.763]
20 Hence, the proposed approach provides a better convex relaxation than that in [14] for the optimization problem in TSVM. [sent-46, score-0.459]
21 • Compared to the semi-definite relaxation TSVM, the proposed algorithm involves fewer free parameters and therefore significantly improves the efficiency by reducing the worstcase computational complexity from O(n6. [sent-47, score-0.404]
22 Figure 1 shows the running time of both the semi-definite relaxation of TSVM in [14] and the proposed convex relaxation for TSVM versus increasing number of unlabeled examples. [sent-50, score-0.817]
23 The data set used in this example is the Course data set (see the experiment section), and the number of labeled examples is 20. [sent-51, score-0.202]
24 We clearly see that the proposed convex relaxation approach is considerably more efficient than the semi-definition approach. [sent-52, score-0.415]
25 Section 2 reviews the related work on the semidefinite relaxation for TSVM. [sent-54, score-0.252]
26 Section 3 presents the convex relaxation approach for Transductive SVM. [sent-55, score-0.4]
27 Section 4 presents the empirical studies that verify the effectiveness of the proposed relaxation for TSVM. [sent-56, score-0.357]
28 2 Related Work In this section, we review the key formulae for Transductive SVM, followed by the semi-definite programming relaxation for TSVM. [sent-58, score-0.312]
29 , xn ) denote the entire data set, including both the labeled examples and the unlabeled ones. [sent-62, score-0.329]
30 We assume that the first l examples within X are labeled by y = (y1 , y2 , . [sent-63, score-0.156]
31 , yl ) where yi ∈ {−1, +1} represents the binary class label assigned to xi . [sent-66, score-0.214]
32 , yn ) ∈ {−1, +1}n the binary class labels predicted for all the data points in X . [sent-70, score-0.059]
33 The goal of TSVM is to estimate y by using both the labeled examples and the unlabeled ones. [sent-71, score-0.306]
34 , l, where C ≥ 0 is the trade-off parameter between the complexity of function w and the margin errors. [sent-80, score-0.164]
35 Evidently, the above problem is a non-convex optimization problem due to the product term y i wj in the constraint. [sent-82, score-0.044]
36 In order to approximate the above problem into a convex programming problem, we first rewrite the above problem into the following form using the Lagrange Theorem: 1 min n (e + ν − δ + λy) D(y)K−1 D(y)(e + ν − δ + λy) + Cδ e (1) 2 ν,y∈{−1,+1} ,δ,λ s. [sent-83, score-0.218]
37 e is the n-dimensional column vector of all ones and K is the kernel matrix. [sent-89, score-0.049]
38 D(y) represents a diagonal matrix whose diagonal elements form the vector y. [sent-90, score-0.065]
39 Using the Schur complement, the above formulation can be further formulated as follows: min t (2) n y∈{−1,+1} ,t,ν,δ,λ yy ◦ K (e + ν − δ + λy) s. [sent-92, score-0.085]
40 e + ν − δ + λy t − 2Cδ e 0 ν ≥ 0, δ ≥ 0, yi = yi , i = 1, 2, . [sent-94, score-0.252]
41 To convert the above problem into a convex optimization problem, the key idea is to replace the quadratic term yy by a linear variable. [sent-98, score-0.213]
42 Based on the result that the set Sa = {M = yy |y ∈ {−1, +1}n } is equivalent to the set Sb = {M|Mi,i = 1, rank(M) = 1}, we can approximate the problem in (2) as follows: min t (3) M,t,ν,δ,λ s. [sent-99, score-0.085]
43 Note that the key differences between (2) and (3) are (a) the rank constraint rank(M) = 1 is removed, and (b) the variable λ is set to be zero, which is equivalent to setting b = 0. [sent-105, score-0.064]
44 As revealed by the previous studies [14, 1], the SDP programming problem resulting from the approximation is computationally expensive. [sent-107, score-0.096]
45 More specifically, there are O(n2 ) parameters in the SDP cone and O(n) linear inequality constraints, which implies a worst-case computational complexity of O(n6. [sent-108, score-0.09]
46 To avoid the high computational complexity, we present a different approach for relaxing TSVM into a convex problem. [sent-110, score-0.152]
47 Compared to the SDP relaxation approach, it is advantageous in that (1) it produces the best convex approximation for TSVM, and (2) it is computationally more efficient than the previous SDP relaxation. [sent-111, score-0.373]
48 3 Relaxed Transductive Support Vector Machine In this section, we follow the work of generalized maximum margin clustering [13] by first studying the case of hard margin, and then extending it to the case of soft margin. [sent-112, score-0.167]
49 1 Hard Margin TSVM In the hard margin case, SVM does not penalize the classification error, which corresponds to δ = 0 in (1). [sent-114, score-0.133]
50 Instead of employing the SDP relaxation as in [14], we follow the work in [13] and introduce a variable z = D(y)(e + ν) = y ◦ (e + ν). [sent-124, score-0.252]
51 Given that ν ≥ 0, the constraints in (4) can be written 2 as yi zi ≥ 1 for the labeled examples, and zi ≥ 1 for all the unlabeled examples. [sent-125, score-0.501]
52 Using this new notation, the optimization problem in (4) can be rewritten as follows: 1 (z + λe) K−1 (z + λe) (5) min z,λ 2 s. [sent-129, score-0.081]
53 One problem with Transductive SVMs is that it is possible to classify all the unlabeled data to one of the classes with a very large margin due to the high dimension and few labeled data. [sent-138, score-0.441]
54 To solve this problem, we introduce the following balance constraint to ensure that no class takes all the unlabeled examples: − ≤ 1 l l zi − i=1 1 n−l n zi ≤ , (6) i=l+1 where ≥ 0 is a constant. [sent-140, score-0.316]
55 Through the above constraint, we aim to ensure that the difference between the labeled data and the unlabeled data in their class assignment is small. [sent-141, score-0.309]
56 We define the Lagrangian of the minimization problem (7) as follows: l min max F(w, γ) w γ n = w P K−1 Pw + 2 γi (1 − wi ) γi (1 − yi wi ) + i=1 i=l+1 +α(c w − ) + β(−c w − ), where γi ≥ 0 for i = 1, . [sent-174, score-0.301]
57 It can be derived from the duality that minw maxγ F(w, γ) = maxγ minw F(w, γ). [sent-178, score-0.064]
58 2 Thus, the dual form of the problem becomes: max γ 1 −1 L(γ) = − (γ ◦ a − (α − β)c) [A − D(b ◦ γ)] (γ ◦ a − (α − β)c) + 4 n γi − (α + β), i=1 We import a variable t, so that 1 − (γ ◦ a − (α − β)c) [A − D(b ◦ γ)]−1 (γ ◦ a − (α − β)c) ≥ −t. [sent-182, score-0.044]
59 4 According to the Schur Complement, we obtain a semi-definite programming cone, from which the optimization problem (9) can be formulated. [sent-183, score-0.104]
60 The problem in (9) is a convex optimization problem, more specifically, a semi-definite programming problem, and can be efficiently solved by the interior-point method [10] implemented in some optimization packages, such as SeDuMi [12]. [sent-185, score-0.269]
61 Besides, our relaxation algorithm has O(n) parameters in the SDP cone and O(n) linear equality constraints, which involves a worst-case computational complexity of O(n4. [sent-186, score-0.367]
62 However, in the previous relaxation algorithms [1, 14], there are approximately O(n2 ) parameters in the SDP cone, which involve a worst-case computational complexity in the scale of O(n6. [sent-188, score-0.283]
63 Therefore, our proposed convex relaxation algorithm is more efficient. [sent-190, score-0.415]
64 In addition, as analyzed in Section 2, the approximation in [1, 14] drops the rank constraint of the matrix y y, which does not lead to a tight approximation. [sent-191, score-0.064]
65 On the other hand, our prediction function f ∗ implements the conjugate of conjugate of the prediction function f (x), which is the convex envelope of f (x) [7]. [sent-192, score-0.286]
66 Thus, our proposed convex approximation method provides a tighter approximation than the previous method. [sent-193, score-0.2]
67 It is interesting to discuss the connection between the solution of the proposed algorithm and that of harmonic functions. [sent-195, score-0.116]
68 (10) 2 It can be further derived as follows: −1 n z= l γi KIi n In − i=l+1 γi yi K(xi , ·) , (11) i=1 where Ii is defined as an n × n matrix with all elements being zero except the i-th diagonal eln ement which is 1, and K(xi , ·) is the i-th column of K. [sent-198, score-0.148]
69 Similar to the solution of the harmonic function, we first propagate the class labels from the labeled examples to the unlabeled one by term l n i −1 . [sent-199, score-0.416]
70 i=1 γi yi K(xi , ·), and then adjust the prediction labels by the factor In − i=l+1 γi KIn The key difference in our solution is that (1) different weights (i. [sent-200, score-0.213]
71 , γi ) are assigned to the labeled examples, and (2) the adjustment factor is different to that in the harmonic function [16]. [sent-202, score-0.163]
72 2 Soft Margin TSVM We extend TSVM to the case of soft margin by considering the following problem: min ν,y,δ,λ s. [sent-204, score-0.204]
73 1 (e + ν − δ + λy) D(y)K−1 D(y)(e + ν − δ + λy) + C 2 l n 2 δi + C u i=1 2 δi i=l+1 ν ≥ 0, δ ≥ 0, yi = yi , 1 ≤ i ≤ l, 2 yi = 1, l + 1 ≤ i ≤ n, where δi is related to the margin error. [sent-206, score-0.511]
74 Note that we distinguish the labeled examples from the unlabeled examples by introducing different penalty constants for margin errors, C for labeled examples and Cu for unlabeled examples. [sent-207, score-0.788]
75 , n, α ≥ 0, β ≥ 0, which is also a semi-definite programming problem and can be solved similarly. [sent-214, score-0.06]
76 4 Experiments In this section, we report empirical study of the proposed method on several benchmark data sets. [sent-217, score-0.102]
77 1 Data Sets Description To make evaluations comprehensive, we have collected four UCI data sets and three text data sets as our experimental testbeds. [sent-219, score-0.133]
78 The UCI data sets include Iono, sonar, Banana, and Breast, which are widely used in data classification. [sent-220, score-0.077]
79 The WinMac data set consists of the classes, mswindows and mac, of the Newsgroup20 data set. [sent-221, score-0.046]
80 The IBM data set contains the classes, IBM and non-IBM, of the Newsgroup20 data set. [sent-222, score-0.046]
81 The course data set is made of the course pages and non-course pages of the WebKb corpus. [sent-223, score-0.077]
82 For each text data set, we randomly sample the data with the sample size of 60, 300 and 1000, respectively. [sent-224, score-0.046]
83 Table 1 describes the information of these data sets, where d represents the data dimensionality, l means the number of labeled data points, and n denotes the total number of examples. [sent-226, score-0.203]
84 Table 1: Data sets used in the experiments, where d represents the data dimensionality, l means the number of labeled data points, and n denotes the total number of examples. [sent-227, score-0.211]
85 5 ), which is difficult to process data sets with hundreds of examples. [sent-232, score-0.054]
86 In each trial, the training set contains each class of data, and the remaining data are then used as the unlabeled (test) data. [sent-238, score-0.173]
87 Moreover, the RBF kernel is used for “Iono”, “Sonar” and “Banana”, and the linear kernel is used for the other data sets. [sent-239, score-0.121]
88 This is because the linear kernel performs better than the RBF kernel on these data sets. [sent-240, score-0.121]
89 The kernel width of RBF kernel is chosen by 5-cross validation on the labeled data. [sent-241, score-0.211]
90 The margin parameter C is tuned by using the labeled data in all algorithms. [sent-242, score-0.269]
91 Due to the small number of labeled examples, for CTSVM and CCCP, the margin parameter for unlabeled data, Cu , is set equal to C . [sent-243, score-0.396]
92 3 Experimental Results Table 2: The classification performance of Transductive SVMs on benchmark data sets. [sent-246, score-0.06]
93 It can be observed that our proposed algorithm performs significantly better than the standard SVM across all the data sets. [sent-366, score-0.065]
94 On the remaining data sets, comparable results have been obtained for our proposed algorithm. [sent-376, score-0.065]
95 From above, the empirical evaluations indicate that our proposed CTSVM method achieves promising classification results comparing with the state-of-the-art methods. [sent-377, score-0.097]
96 5 Conclusion and Future Work This paper presents a novel method for Transductive SVM by relaxing the unknown labels to the continuous variables. [sent-378, score-0.094]
97 In contrast to the previous relaxation method which involves O(n 2 ) free parameters in the semi-definite matrix, our method takes the advantages of reducing the number of free parameters to O(n), and can solve the optimization problem more efficiently. [sent-379, score-0.385]
98 In addition, the proposed approach provides a tighter convex relaxation for the optimization problem in TSVM. [sent-380, score-0.496]
99 Empirical studies on benchmark data sets demonstrate that the proposed method is more efficient than the previous semi-definite relaxation method and achieves promising classification results comparing to the state-of-the-art methods. [sent-381, score-0.451]
100 Interior point polynomial methods in convex programming: Theory and applications. [sent-459, score-0.121]
wordName wordTfidf (topN-words)
[('tsvm', 0.702), ('relaxation', 0.252), ('transductive', 0.237), ('ctsvm', 0.226), ('unlabeled', 0.15), ('sdp', 0.139), ('margin', 0.133), ('yi', 0.126), ('convex', 0.121), ('labeled', 0.113), ('kong', 0.109), ('svm', 0.102), ('cccp', 0.1), ('iono', 0.1), ('sonar', 0.1), ('hong', 0.1), ('banana', 0.084), ('rtsvm', 0.075), ('wi', 0.069), ('yl', 0.067), ('chapelle', 0.064), ('programming', 0.06), ('cone', 0.059), ('zi', 0.056), ('cu', 0.053), ('shatin', 0.05), ('breast', 0.05), ('harmonic', 0.05), ('kernel', 0.049), ('yy', 0.048), ('nite', 0.045), ('optimization', 0.044), ('dual', 0.044), ('pw', 0.044), ('examples', 0.043), ('rn', 0.042), ('proposed', 0.042), ('sedumi', 0.04), ('schur', 0.04), ('rbf', 0.038), ('benchmark', 0.037), ('conjugate', 0.037), ('tighter', 0.037), ('envelope', 0.037), ('min', 0.037), ('labels', 0.036), ('studies', 0.036), ('classi', 0.035), ('lkopf', 0.034), ('rank', 0.034), ('soft', 0.034), ('sindhwani', 0.033), ('sch', 0.033), ('free', 0.032), ('minw', 0.032), ('sets', 0.031), ('complexity', 0.031), ('chinese', 0.031), ('devoted', 0.031), ('relaxing', 0.031), ('annealing', 0.031), ('support', 0.03), ('constraint', 0.03), ('hoffman', 0.03), ('promising', 0.03), ('approximates', 0.029), ('complement', 0.028), ('presents', 0.027), ('course', 0.027), ('prediction', 0.027), ('usa', 0.026), ('ibm', 0.025), ('remark', 0.025), ('xu', 0.025), ('evaluations', 0.025), ('zhu', 0.025), ('involves', 0.025), ('solution', 0.024), ('platt', 0.024), ('balance', 0.024), ('suffer', 0.024), ('ma', 0.023), ('international', 0.023), ('data', 0.023), ('diagonal', 0.022), ('initialization', 0.022), ('classify', 0.022), ('uci', 0.022), ('worstcase', 0.022), ('mation', 0.022), ('sinz', 0.022), ('china', 0.022), ('keerthi', 0.022), ('lyu', 0.022), ('respecting', 0.022), ('rong', 0.022), ('relaxed', 0.022), ('svms', 0.022), ('proceedings', 0.021), ('represents', 0.021)]
simIndex simValue paperId paperTitle
same-paper 1 1.0000001 76 nips-2007-Efficient Convex Relaxation for Transductive Support Vector Machine
Author: Zenglin Xu, Rong Jin, Jianke Zhu, Irwin King, Michael Lyu
Abstract: We consider the problem of Support Vector Machine transduction, which involves a combinatorial problem with exponential computational complexity in the number of unlabeled examples. Although several studies are devoted to Transductive SVM, they suffer either from the high computation complexity or from the solutions of local optimum. To address this problem, we propose solving Transductive SVM via a convex relaxation, which converts the NP-hard problem to a semi-definite programming. Compared with the other SDP relaxation for Transductive SVM, the proposed algorithm is computationally more efficient with the number of free parameters reduced from O(n2 ) to O(n) where n is the number of examples. Empirical study with several benchmark data sets shows the promising performance of the proposed algorithm in comparison with other state-of-the-art implementations of Transductive SVM. 1
2 0.13018067 112 nips-2007-Learning Monotonic Transformations for Classification
Author: Andrew Howard, Tony Jebara
Abstract: A discriminative method is proposed for learning monotonic transformations of the training data while jointly estimating a large-margin classifier. In many domains such as document classification, image histogram classification and gene microarray experiments, fixed monotonic transformations can be useful as a preprocessing step. However, most classifiers only explore these transformations through manual trial and error or via prior domain knowledge. The proposed method learns monotonic transformations automatically while training a large-margin classifier without any prior knowledge of the domain. A monotonic piecewise linear function is learned which transforms data for subsequent processing by a linear hyperplane classifier. Two algorithmic implementations of the method are formalized. The first solves a convergent alternating sequence of quadratic and linear programs until it obtains a locally optimal solution. An improved algorithm is then derived using a convex semidefinite relaxation that overcomes initialization issues in the greedy optimization problem. The effectiveness of these learned transformations on synthetic problems, text data and image data is demonstrated. 1
3 0.11402169 65 nips-2007-DIFFRAC: a discriminative and flexible framework for clustering
Author: Francis R. Bach, Zaïd Harchaoui
Abstract: We present a novel linear clustering framework (D IFFRAC) which relies on a linear discriminative cost function and a convex relaxation of a combinatorial optimization problem. The large convex optimization problem is solved through a sequence of lower dimensional singular value decompositions. This framework has several attractive properties: (1) although apparently similar to K-means, it exhibits superior clustering performance than K-means, in particular in terms of robustness to noise. (2) It can be readily extended to non linear clustering if the discriminative cost function is based on positive definite kernels, and can then be seen as an alternative to spectral clustering. (3) Prior information on the partition is easily incorporated, leading to state-of-the-art performance for semi-supervised learning, for clustering or classification. We present empirical evaluations of our algorithms on synthetic and real medium-scale datasets.
4 0.1135979 69 nips-2007-Discriminative Batch Mode Active Learning
Author: Yuhong Guo, Dale Schuurmans
Abstract: Active learning sequentially selects unlabeled instances to label with the goal of reducing the effort needed to learn a good classifier. Most previous studies in active learning have focused on selecting one unlabeled instance to label at a time while retraining in each iteration. Recently a few batch mode active learning approaches have been proposed that select a set of most informative unlabeled instances in each iteration under the guidance of heuristic scores. In this paper, we propose a discriminative batch mode active learning approach that formulates the instance selection task as a continuous optimization problem over auxiliary instance selection variables. The optimization is formulated to maximize the discriminative classification performance of the target classifier, while also taking the unlabeled data into account. Although the objective is not convex, we can manipulate a quasi-Newton method to obtain a good local solution. Our empirical studies on UCI datasets show that the proposed active learning is more effective than current state-of-the art batch mode active learning algorithms. 1
5 0.11069868 186 nips-2007-Statistical Analysis of Semi-Supervised Regression
Author: Larry Wasserman, John D. Lafferty
Abstract: Semi-supervised methods use unlabeled data in addition to labeled data to construct predictors. While existing semi-supervised methods have shown some promising empirical performance, their development has been based largely based on heuristics. In this paper we study semi-supervised learning from the viewpoint of minimax theory. Our first result shows that some common methods based on regularization using graph Laplacians do not lead to faster minimax rates of convergence. Thus, the estimators that use the unlabeled data do not have smaller risk than the estimators that use only labeled data. We then develop several new approaches that provably lead to improved performance. The statistical tools of minimax analysis are thus used to offer some new perspective on the problem of semi-supervised learning. 1
6 0.11063784 63 nips-2007-Convex Relaxations of Latent Variable Training
7 0.10574103 166 nips-2007-Regularized Boost for Semi-Supervised Learning
8 0.10195161 175 nips-2007-Semi-Supervised Multitask Learning
9 0.10080915 10 nips-2007-A Randomized Algorithm for Large Scale Support Vector Learning
10 0.099459827 23 nips-2007-An Analysis of Convex Relaxations for MAP Estimation
11 0.098343112 11 nips-2007-A Risk Minimization Principle for a Class of Parzen Estimators
12 0.09578716 201 nips-2007-The Value of Labeled and Unlabeled Examples when the Model is Imperfect
13 0.088617802 190 nips-2007-Support Vector Machine Classification with Indefinite Kernels
14 0.087285317 62 nips-2007-Convex Learning with Invariances
15 0.071402512 32 nips-2007-Bayesian Co-Training
16 0.069366306 88 nips-2007-Fast and Scalable Training of Semi-Supervised CRFs with Application to Activity Recognition
17 0.069141164 212 nips-2007-Using Deep Belief Nets to Learn Covariance Kernels for Gaussian Processes
18 0.068090484 80 nips-2007-Ensemble Clustering using Semidefinite Programming
19 0.066869907 134 nips-2007-Multi-Task Learning via Conic Programming
20 0.066422939 38 nips-2007-Boosting Algorithms for Maximizing the Soft Margin
topicId topicWeight
[(0, -0.184), (1, 0.031), (2, -0.156), (3, 0.146), (4, -0.014), (5, -0.027), (6, 0.104), (7, -0.001), (8, 0.033), (9, 0.127), (10, 0.104), (11, -0.126), (12, -0.095), (13, -0.026), (14, 0.036), (15, 0.093), (16, -0.069), (17, 0.045), (18, 0.182), (19, 0.041), (20, -0.064), (21, -0.085), (22, 0.001), (23, -0.016), (24, 0.025), (25, 0.044), (26, 0.002), (27, 0.08), (28, 0.054), (29, -0.007), (30, 0.065), (31, -0.003), (32, 0.073), (33, 0.014), (34, -0.054), (35, -0.06), (36, 0.019), (37, 0.001), (38, -0.035), (39, -0.017), (40, -0.093), (41, -0.016), (42, -0.096), (43, -0.021), (44, 0.054), (45, -0.026), (46, -0.003), (47, 0.066), (48, 0.115), (49, 0.008)]
simIndex simValue paperId paperTitle
same-paper 1 0.94657636 76 nips-2007-Efficient Convex Relaxation for Transductive Support Vector Machine
Author: Zenglin Xu, Rong Jin, Jianke Zhu, Irwin King, Michael Lyu
Abstract: We consider the problem of Support Vector Machine transduction, which involves a combinatorial problem with exponential computational complexity in the number of unlabeled examples. Although several studies are devoted to Transductive SVM, they suffer either from the high computation complexity or from the solutions of local optimum. To address this problem, we propose solving Transductive SVM via a convex relaxation, which converts the NP-hard problem to a semi-definite programming. Compared with the other SDP relaxation for Transductive SVM, the proposed algorithm is computationally more efficient with the number of free parameters reduced from O(n2 ) to O(n) where n is the number of examples. Empirical study with several benchmark data sets shows the promising performance of the proposed algorithm in comparison with other state-of-the-art implementations of Transductive SVM. 1
2 0.7393828 112 nips-2007-Learning Monotonic Transformations for Classification
Author: Andrew Howard, Tony Jebara
Abstract: A discriminative method is proposed for learning monotonic transformations of the training data while jointly estimating a large-margin classifier. In many domains such as document classification, image histogram classification and gene microarray experiments, fixed monotonic transformations can be useful as a preprocessing step. However, most classifiers only explore these transformations through manual trial and error or via prior domain knowledge. The proposed method learns monotonic transformations automatically while training a large-margin classifier without any prior knowledge of the domain. A monotonic piecewise linear function is learned which transforms data for subsequent processing by a linear hyperplane classifier. Two algorithmic implementations of the method are formalized. The first solves a convergent alternating sequence of quadratic and linear programs until it obtains a locally optimal solution. An improved algorithm is then derived using a convex semidefinite relaxation that overcomes initialization issues in the greedy optimization problem. The effectiveness of these learned transformations on synthetic problems, text data and image data is demonstrated. 1
3 0.59959716 69 nips-2007-Discriminative Batch Mode Active Learning
Author: Yuhong Guo, Dale Schuurmans
Abstract: Active learning sequentially selects unlabeled instances to label with the goal of reducing the effort needed to learn a good classifier. Most previous studies in active learning have focused on selecting one unlabeled instance to label at a time while retraining in each iteration. Recently a few batch mode active learning approaches have been proposed that select a set of most informative unlabeled instances in each iteration under the guidance of heuristic scores. In this paper, we propose a discriminative batch mode active learning approach that formulates the instance selection task as a continuous optimization problem over auxiliary instance selection variables. The optimization is formulated to maximize the discriminative classification performance of the target classifier, while also taking the unlabeled data into account. Although the objective is not convex, we can manipulate a quasi-Newton method to obtain a good local solution. Our empirical studies on UCI datasets show that the proposed active learning is more effective than current state-of-the art batch mode active learning algorithms. 1
4 0.56872672 23 nips-2007-An Analysis of Convex Relaxations for MAP Estimation
Author: Pawan Mudigonda, Vladimir Kolmogorov, Philip Torr
Abstract: The problem of obtaining the maximum a posteriori estimate of a general discrete random field (i.e. a random field defined using a finite and discrete set of labels) is known to be NP-hard. However, due to its central importance in many applications, several approximate algorithms have been proposed in the literature. In this paper, we present an analysis of three such algorithms based on convex relaxations: (i) LP - S: the linear programming (LP) relaxation proposed by Schlesinger [20] for a special case and independently in [4, 12, 23] for the general case; (ii) QP - RL: the quadratic programming (QP) relaxation by Ravikumar and Lafferty [18]; and (iii) SOCP - MS: the second order cone programming (SOCP) relaxation first proposed by Muramatsu and Suzuki [16] for two label problems and later extended in [14] for a general label set. We show that the SOCP - MS and the QP - RL relaxations are equivalent. Furthermore, we prove that despite the flexibility in the form of the constraints/objective function offered by QP and SOCP, the LP - S relaxation strictly dominates (i.e. provides a better approximation than) QP - RL and SOCP - MS. We generalize these results by defining a large class of SOCP (and equivalent QP) relaxations which is dominated by the LP - S relaxation. Based on these results we propose some novel SOCP relaxations which strictly dominate the previous approaches.
5 0.56017596 201 nips-2007-The Value of Labeled and Unlabeled Examples when the Model is Imperfect
Author: Kaushik Sinha, Mikhail Belkin
Abstract: Semi-supervised learning, i.e. learning from both labeled and unlabeled data has received significant attention in the machine learning literature in recent years. Still our understanding of the theoretical foundations of the usefulness of unlabeled data remains somewhat limited. The simplest and the best understood situation is when the data is described by an identifiable mixture model, and where each class comes from a pure component. This natural setup and its implications ware analyzed in [11, 5]. One important result was that in certain regimes, labeled data becomes exponentially more valuable than unlabeled data. However, in most realistic situations, one would not expect that the data comes from a parametric mixture distribution with identifiable components. There have been recent efforts to analyze the non-parametric situation, for example, “cluster” and “manifold” assumptions have been suggested as a basis for analysis. Still, a satisfactory and fairly complete theoretical understanding of the nonparametric problem, similar to that in [11, 5] has not yet been developed. In this paper we investigate an intermediate situation, when the data comes from a probability distribution, which can be modeled, but not perfectly, by an identifiable mixture distribution. This seems applicable to many situation, when, for example, a mixture of Gaussians is used to model the data. the contribution of this paper is an analysis of the role of labeled and unlabeled data depending on the amount of imperfection in the model.
6 0.55827326 190 nips-2007-Support Vector Machine Classification with Indefinite Kernels
7 0.55606383 186 nips-2007-Statistical Analysis of Semi-Supervised Regression
8 0.5469178 62 nips-2007-Convex Learning with Invariances
9 0.541188 65 nips-2007-DIFFRAC: a discriminative and flexible framework for clustering
10 0.54067564 63 nips-2007-Convex Relaxations of Latent Variable Training
11 0.53096473 24 nips-2007-An Analysis of Inference with the Universum
12 0.52645391 88 nips-2007-Fast and Scalable Training of Semi-Supervised CRFs with Application to Activity Recognition
13 0.49955925 10 nips-2007-A Randomized Algorithm for Large Scale Support Vector Learning
14 0.49955252 166 nips-2007-Regularized Boost for Semi-Supervised Learning
15 0.45221603 152 nips-2007-Parallelizing Support Vector Machines on Distributed Computers
16 0.45129699 11 nips-2007-A Risk Minimization Principle for a Class of Parzen Estimators
17 0.44787076 175 nips-2007-Semi-Supervised Multitask Learning
18 0.43146262 134 nips-2007-Multi-Task Learning via Conic Programming
19 0.42319372 45 nips-2007-Classification via Minimum Incremental Coding Length (MICL)
20 0.41807139 32 nips-2007-Bayesian Co-Training
topicId topicWeight
[(5, 0.033), (11, 0.011), (13, 0.075), (16, 0.027), (19, 0.025), (21, 0.076), (31, 0.052), (34, 0.021), (35, 0.028), (47, 0.077), (49, 0.016), (59, 0.011), (83, 0.124), (85, 0.049), (87, 0.017), (90, 0.06), (98, 0.218)]
simIndex simValue paperId paperTitle
1 0.80523181 105 nips-2007-Infinite State Bayes-Nets for Structured Domains
Author: Max Welling, Ian Porteous, Evgeniy Bart
Abstract: A general modeling framework is proposed that unifies nonparametric-Bayesian models, topic-models and Bayesian networks. This class of infinite state Bayes nets (ISBN) can be viewed as directed networks of ‘hierarchical Dirichlet processes’ (HDPs) where the domain of the variables can be structured (e.g. words in documents or features in images). We show that collapsed Gibbs sampling can be done efficiently in these models by leveraging the structure of the Bayes net and using the forward-filtering-backward-sampling algorithm for junction trees. Existing models, such as nested-DP, Pachinko allocation, mixed membership stochastic block models as well as a number of new models are described as ISBNs. Two experiments have been performed to illustrate these ideas. 1
2 0.77476048 82 nips-2007-Estimating divergence functionals and the likelihood ratio by penalized convex risk minimization
Author: Xuanlong Nguyen, Martin J. Wainwright, Michael I. Jordan
Abstract: We develop and analyze an algorithm for nonparametric estimation of divergence functionals and the density ratio of two probability distributions. Our method is based on a variational characterization of f -divergences, which turns the estimation into a penalized convex risk minimization problem. We present a derivation of our kernel-based estimation algorithm and an analysis of convergence rates for the estimator. Our simulation results demonstrate the convergence behavior of the method, which compares favorably with existing methods in the literature. 1
same-paper 3 0.76400572 76 nips-2007-Efficient Convex Relaxation for Transductive Support Vector Machine
Author: Zenglin Xu, Rong Jin, Jianke Zhu, Irwin King, Michael Lyu
Abstract: We consider the problem of Support Vector Machine transduction, which involves a combinatorial problem with exponential computational complexity in the number of unlabeled examples. Although several studies are devoted to Transductive SVM, they suffer either from the high computation complexity or from the solutions of local optimum. To address this problem, we propose solving Transductive SVM via a convex relaxation, which converts the NP-hard problem to a semi-definite programming. Compared with the other SDP relaxation for Transductive SVM, the proposed algorithm is computationally more efficient with the number of free parameters reduced from O(n2 ) to O(n) where n is the number of examples. Empirical study with several benchmark data sets shows the promising performance of the proposed algorithm in comparison with other state-of-the-art implementations of Transductive SVM. 1
4 0.64666837 63 nips-2007-Convex Relaxations of Latent Variable Training
Author: Yuhong Guo, Dale Schuurmans
Abstract: We investigate a new, convex relaxation of an expectation-maximization (EM) variant that approximates a standard objective while eliminating local minima. First, a cautionary result is presented, showing that any convex relaxation of EM over hidden variables must give trivial results if any dependence on the missing values is retained. Although this appears to be a strong negative outcome, we then demonstrate how the problem can be bypassed by using equivalence relations instead of value assignments over hidden variables. In particular, we develop new algorithms for estimating exponential conditional models that only require equivalence relation information over the variable values. This reformulation leads to an exact expression for EM variants in a wide range of problems. We then develop a semidefinite relaxation that yields global training by eliminating local minima. 1
5 0.64315516 94 nips-2007-Gaussian Process Models for Link Analysis and Transfer Learning
Author: Kai Yu, Wei Chu
Abstract: This paper aims to model relational data on edges of networks. We describe appropriate Gaussian Processes (GPs) for directed, undirected, and bipartite networks. The inter-dependencies of edges can be effectively modeled by adapting the GP hyper-parameters. The framework suggests an intimate connection between link prediction and transfer learning, which were traditionally two separate research topics. We develop an efficient learning algorithm that can handle a large number of observations. The experimental results on several real-world data sets verify superior learning capacity. 1
6 0.64064288 69 nips-2007-Discriminative Batch Mode Active Learning
7 0.63867354 84 nips-2007-Expectation Maximization and Posterior Constraints
8 0.63844514 24 nips-2007-An Analysis of Inference with the Universum
9 0.63798875 86 nips-2007-Exponential Family Predictive Representations of State
10 0.63581002 177 nips-2007-Simplified Rules and Theoretical Analysis for Information Bottleneck Optimization and PCA with Spiking Neurons
11 0.63438511 18 nips-2007-A probabilistic model for generating realistic lip movements from speech
12 0.63438255 97 nips-2007-Hidden Common Cause Relations in Relational Learning
13 0.63374746 79 nips-2007-Efficient multiple hyperparameter learning for log-linear models
14 0.63351625 134 nips-2007-Multi-Task Learning via Conic Programming
15 0.63036466 34 nips-2007-Bayesian Policy Learning with Trans-Dimensional MCMC
16 0.63016266 138 nips-2007-Near-Maximum Entropy Models for Binary Neural Representations of Natural Images
17 0.6291123 156 nips-2007-Predictive Matrix-Variate t Models
18 0.62763649 102 nips-2007-Incremental Natural Actor-Critic Algorithms
19 0.62717134 88 nips-2007-Fast and Scalable Training of Semi-Supervised CRFs with Application to Activity Recognition
20 0.62670696 185 nips-2007-Stable Dual Dynamic Programming