nips nips2006 nips2006-104 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Ivor W. Tsang, James T. Kwok
Abstract: Semi-supervised learning is more powerful than supervised learning by using both labeled and unlabeled data. In particular, the manifold regularization framework, together with kernel methods, leads to the Laplacian SVM (LapSVM) that has demonstrated state-of-the-art performance. However, the LapSVM solution typically involves kernel expansions of all the labeled and unlabeled examples, and is slow on testing. Moreover, existing semi-supervised learning methods, including the LapSVM, can only handle a small number of unlabeled examples. In this paper, we integrate manifold regularization with the core vector machine, which has been used for large-scale supervised and unsupervised learning. By using a sparsified manifold regularizer and formulating as a center-constrained minimum enclosing ball problem, the proposed method produces sparse solutions with low time and space complexities. Experimental results show that it is much faster than the LapSVM, and can handle a million unlabeled examples on a standard PC; while the LapSVM can only handle several thousand patterns. 1
Reference: text
sentIndex sentText sentNum sentScore
1 hk Abstract Semi-supervised learning is more powerful than supervised learning by using both labeled and unlabeled data. [sent-5, score-0.524]
2 In particular, the manifold regularization framework, together with kernel methods, leads to the Laplacian SVM (LapSVM) that has demonstrated state-of-the-art performance. [sent-6, score-0.313]
3 However, the LapSVM solution typically involves kernel expansions of all the labeled and unlabeled examples, and is slow on testing. [sent-7, score-0.545]
4 Moreover, existing semi-supervised learning methods, including the LapSVM, can only handle a small number of unlabeled examples. [sent-8, score-0.308]
5 In this paper, we integrate manifold regularization with the core vector machine, which has been used for large-scale supervised and unsupervised learning. [sent-9, score-0.342]
6 By using a sparsified manifold regularizer and formulating as a center-constrained minimum enclosing ball problem, the proposed method produces sparse solutions with low time and space complexities. [sent-10, score-0.34]
7 Experimental results show that it is much faster than the LapSVM, and can handle a million unlabeled examples on a standard PC; while the LapSVM can only handle several thousand patterns. [sent-11, score-0.462]
8 1 Introduction In many real-world applications, collection of labeled data is both time-consuming and expensive. [sent-12, score-0.134]
9 On the other hand, a large amount of unlabeled data are often readily available. [sent-13, score-0.288]
10 While traditional supervised learning methods can only learn from the limited amount of labeled data, semi-supervised learning [2] aims at improving the generalization performance by utilizing both the labeled and unlabeled data. [sent-14, score-0.593]
11 When the data lie on a manifold, it is common to approximate this manifold by a weighted graph, leading to graph-based semi-supervised learning methods. [sent-17, score-0.192]
12 In this paper, we focus on the manifold regularization frameo work proposed in [1]. [sent-20, score-0.249]
13 By defining a data-dependent reproducing kernel Hilbert space (RKHS), manifold regularization incorporates an additional regularizer to ensure that the learned function is smooth on the manifold. [sent-21, score-0.391]
14 Moreover, while the original motivation of semi-supervised learning is to utilize the large amount of unlabeled data available, existing algorithms are only capable of handling a small to moderate amount of unlabeled data. [sent-25, score-0.576]
15 [9] speeded up manifold regularization by restraining to linear models, which, however, may not be flexible enough for complicated target functions. [sent-28, score-0.249]
16 As reported in a recent survey [14], most semisupervised learning methods can only handle 100 – 10,000 unlabeled examples. [sent-31, score-0.332]
17 The largest graph a they worked with involve 75,888 labeled and unlabeled examples. [sent-34, score-0.425]
18 Thus, no one has ever been experimented on massive data sets with, say, one million unlabeled examples. [sent-35, score-0.359]
19 On the other hand, the Core Vector Machine (CVM) is recently proposed for scaling up kernel methods in both supervised (including classification [12] and regression [13]) and unsupervised learning (e. [sent-36, score-0.101]
20 Experimental results on real world data sets with millions of patterns demonstrated that the CVM is much faster than existing SVM implementations and can handle much larger data sets. [sent-41, score-0.119]
21 To restore sparsity of the LapSVM solution, we first introduce a sparsified manifold regularizer based on the -insensitive loss. [sent-43, score-0.313]
22 In Section 2, we first give a brief review on manifold regularization. [sent-47, score-0.192]
23 2 Manifold Regularization Given a training set {(xi , yi )}m with input xi ∈ X and output yi ∈ R. [sent-50, score-0.238]
24 The regularized risk i=1 functional is the sum of the empirical risk (corresponding to a loss function ) and a regularizer Ω. [sent-51, score-0.163]
25 Given a kernel k and its RKHS Hk , we minimize the regularized risk over function f in Hk : min f ∈Hk 1 m m (xi , yi , f (xi )) + λΩ( f (1) Hk ). [sent-52, score-0.156]
26 In semi-supervised learning, we have both labeled examples {(xi , yi )}m and unlabeled examples i=1 {xi }m+n . [sent-56, score-0.539]
27 Manifold regularization uses an additional regularizer f 2 to ensure that the function I i=m+1 f is smooth on the intrinsic structure of the input. [sent-57, score-0.135]
28 It is common to approximate I this manifold by a weighted graph defined on all the labeled and unlabeled data, as G = (V, E) with V and E being the sets of vertices and edges respectively. [sent-62, score-0.617]
29 Then, f 2 is approximated as1 I f 2 I w(ue , ve ) = e∈E 1 f (xue ) f (xve ) − s(ue ) s(ve ) 2 , (3) When the set of labeled and unlabeled data is small, a function that is smooth on this small set may not be interesting. [sent-65, score-0.451]
30 where ue and ve are vertices of the edge e, and s(u) = d(u) when the normalized graph Laplacian is used, and s(u) = 1 with the unnormalized one. [sent-67, score-0.272]
31 As shown in [1], the minimizer of (2) becomes m+n f (x) = i=1 αi k(xi , x), which depends on both labeled and unlabeled examples. [sent-68, score-0.43]
32 1 Laplacian SVM Using the hinge loss (xi , yi , f (xi )) = max(0, 1 − yi f (xi )) in (2), we obtain the Laplacian SVM (LapSVM) [1]. [sent-70, score-0.189]
33 , 1] , Qm×m = YJK(2λI + 2λI LK) J Y, Ym×m is the diagonal matrix with Yii = yi , K(m+n)×(m+n) is the kernel matrix over both the labeled and unlabeled data, L(m+n)×(m+n) is the graph Laplacian, and Jm×(m+n) with Jij = 1 if i = j and xi is a labeled example, and Jij = 0 otherwise. [sent-79, score-0.767]
34 Substituting (4) into (2), we have: m min f ∈Hk 1 (xi , yi , f (xi ))+λI m i=1 w(ue , ve ) e∈E f (xue ) f (xve ) − s(ue ) s(ve ) 2 +λΩ( f Hk ). [sent-91, score-0.122]
35 ε ¯ By treating the terms inside the braces as the “loss function”, this can be regarded as regularized risk minimization and, using the standard representer theorem, the minimizer f then admits the form m+n f (x) = i=1 αi k(xi , x), same as that of the original manifold regularization. [sent-92, score-0.262]
36 The primal 2 I = 2 ∗ (ζe + ζe 2 ) where of the (5) e∈E yi (w ϕ(xi ) + b) ≥ 1 − ε − ξi , i = 1, . [sent-96, score-0.107]
37 Here, −U |E|µ U + Cθ I K is the kernel matrix defined using kernel k on the m labeled examples, U|E|×|E| = [ψ e ψ f + τe τf ], and Vm×|E| = [yi ϕ(xi ) ψ e + τe ]. [sent-116, score-0.262]
38 1) requires O((m + n)2 ) kernel k(xi , xj ) evaluations, each entry in K here takes only O(1) kernel evaluations. [sent-118, score-0.128]
39 Moreover, the primal vari¯ ables can be easily recovered from the dual variables by the KKT conditions. [sent-122, score-0.11]
40 In particular, m m ∗ ∗ w = C i=1 βi yi ϕ(xi ) + e∈E (γe − γe )ψ e and b = C i=1 βi yi + e∈E (γe − γe )τe . [sent-123, score-0.144]
41 Subsequently, the decision function f (x) = w ϕ(x) + b is a linear combination of k(xi , x)’s defined on both the labeled and unlabeled examples, as in standard manifold regularization. [sent-124, score-0.593]
42 ˜ ˜ ˜˜ ˜ ˜ The above formulation can be easily extended to the regression case, with the pattern output changed from ±1 to yi ∈ R, and the hinge loss replaced by the -insensitive loss. [sent-132, score-0.146]
43 2 that the -insensitive loss achieves a similar effect as the 1 penalty in LASSO [11], which is known to produce sparse approximation. [sent-140, score-0.115]
44 Similarly, manifold regularization finds a f that is locally smooth. [sent-148, score-0.249]
45 While sparse approximation techniques such as basis pursuit typically use the L2 norm for the error, Girosi argued that the norm of the RKHS Hk is a better measure of smoothness. [sent-163, score-0.09]
46 K First, consider the simpler case where the manifold regularizer is replaced by a simple regularizer α 2 . [sent-172, score-0.348]
47 On the other hand, consider the following variant of SVR m 2 ∗2 ∗ (ξi + ξi ) + 2C ε : yi − w ϕ(xi ) ≤ ε + ξi , w ϕ(xi ) − yi ≤ ε + ξi . [sent-177, score-0.144]
48 (12) ¯ ¯ ¯ i=1 It can be shown that its dual is identical to (11), with β, β ∗ as dual variables. [sent-178, score-0.15]
49 As above, the key steps are on replacing the 2 norm by the kernel PCA map, and adding a 1 penalty on the variables. [sent-185, score-0.113]
50 It can then be shown that sparsified manifold regularizer (based on the -insensitive loss) can again be recovered by using the LASSO penalty. [sent-186, score-0.27]
51 (Here, we ignore O(m+|E|) space required for storing the m training patterns and 2|E| edge constraints, as these may be stored outside the core memory. [sent-191, score-0.109]
52 ) They are thus independent of the numbers of labeled and unlabeled examples for a fixed . [sent-192, score-0.434]
53 This “reduced LapSVM” solves a smaller optimization problem that involves a random r×(m+n) rectangular subset of the kernel matrix, where the r patterns are chosen from both the labeled and unlabeled data. [sent-197, score-0.496]
54 Moreover, both can be regarded as primal methods that maintain primal4 feasibility and work towards dual feasibility. [sent-202, score-0.11]
55 Also, as is typical in column generation, the dual variable whose KKT condition is most violated is added at each iteration. [sent-203, score-0.111]
56 Note that each dual variable then corresponds to a training pattern. [sent-208, score-0.097]
57 Instead of requiring the dual solution to be strictly feasible, CVM only requires it to be feasible within a factor of (1 + ). [sent-213, score-0.107]
58 This, together with the fact that its dual is a MEB problem, allows its number of iterations for convergence to be bounded and thus the total time and space complexities guaranteed. [sent-214, score-0.105]
59 By regarding the CVM as the approximation algorithm counterpart of column generation, this suggests that the CVM can also be used in the same way as column generation in speeding up other optimization problems. [sent-216, score-0.11]
60 The graph (for the manifold) is constructed by using the 6 nearest neighbors of each pattern, and the weight w(u e , ve ) 1 in (3) is defined as exp(− xue − xve 2 /βg ), where βg = |E| e∈E xeu − xev 2 . [sent-223, score-0.226]
61 Unless otherwise specified, m 1 ¯ we use the Gaussian kernel exp(− x − z 2 /β), with β = m i=1 xi − x 2 . [sent-226, score-0.136]
62 To better illustrate the scaling behavior, we vary the number of unlabeled patterns used for training (from 1, 000 up to a maximum of 1 million). [sent-234, score-0.32]
63 4 3 SLapCVM LapSVM Reduced LapSVM 2 10 number of kernel expansions CPU time (in seconds) 10 1 10 0 10 −1 10 (a) Data distribution. [sent-238, score-0.112]
64 3 10 4 5 10 10 number of unlabeled points (c) CPU time. [sent-240, score-0.267]
65 6 10 10 SLapCVM core−set Size LapSVM Reduced LapSVM 3 10 2 10 3 10 4 5 10 10 number of unlabeled points (d) #kernel sions. [sent-241, score-0.267]
66 6 10 expan- Figure 1: Results on the two-moons data set (some abscissas and ordinates are in log scale). [sent-242, score-0.098]
67 The two labeled examples are labeled in red in Figure 1(a). [sent-243, score-0.301]
68 Both the LapSVM and SLapCVM always attain 100% accuracy on the test set, even with only two labeled examples (Figure 1(b)). [sent-245, score-0.189]
69 and all the labeled and unlabeled examples are involved in the solution (Figure 1(d))). [sent-257, score-0.466]
70 As can be seen from Figures 1(c) and 1(d), both the time and space required by the SLapCVM are almost constant, even when the unlabeled data set gets very large. [sent-259, score-0.267]
71 The reduced LapSVM, though also fast, is slightly inferior to both the SLapCVM and LapSVM. [sent-260, score-0.109]
72 One labeled example is randomly sampled from each class for training. [sent-264, score-0.134]
73 To achieve comparable accuracy, we use r = 2, 000 for the reduced LapSVM. [sent-265, score-0.108]
74 For comparison, we also train a standard SVM with the two labeled examples. [sent-266, score-0.134]
75 For the SLapCVM, both the time required and number of kernel expansions involved grow only sublinearly with the number of unlabeled examples. [sent-269, score-0.379]
76 Figure 2(c) demonstrates that semi-supervised learning (using either the LapSVMs or SLapCVM) can have much better generalization performance than supervised learning using the labeled examples only. [sent-270, score-0.226]
77 On the other hand, the reduced LapSVM has comparable speed with the SLapCVM, but its performance is inferior and cannot handle large data sets. [sent-272, score-0.172]
78 10 3 10 4 5 10 10 number of unlabeled points (b) #kernel expansions. [sent-274, score-0.267]
79 6 10 0 3 10 4 5 10 10 number of unlabeled points 6 10 (c) Test error. [sent-275, score-0.267]
80 Figure 2: Results on the extended USPS data set (some abscissas and ordinates are in log scale). [sent-276, score-0.127]
81 3 Extended MIT Face Data Set In this section, we perform face detection using the extended MIT face database in [12]. [sent-278, score-0.091]
82 Five labeled example are randomly sampled from each class and used in training. [sent-279, score-0.134]
83 For comparison, we also train two SVMs: one uses the 10 labeled examples only while the other uses all the labeled examples (a total of 889,986) in the original training set of [12]. [sent-284, score-0.356]
84 Note that the SLapCVM, using only 10 labeled examples, can attain comparable AUC and even better balanced loss than the SVM trained on the original, massive training set (Figures 3(c) and 3(d)). [sent-287, score-0.332]
85 This clearly demonstrates the usefulness of semi-supervised learning when a large amount of unlabeled data can be utilized. [sent-288, score-0.31]
86 On the other hand, note that the LapSVM again cannot be run with more than 3,000 unlabeled examples on our PC because of its high space requirement. [sent-289, score-0.3]
87 2) How to handle data sets with millions of unlabeled examples? [sent-292, score-0.329]
88 For the number of kernel expansions CPU time (in seconds) 10 50 3 10 2 10 SLapCVM core−set Size LapSVM Reduced LapSVM 45 4 10 3 10 40 35 1. [sent-293, score-0.112]
89 85 0 10 3 10 2 4 10 number of unlabeled points (a) CPU time. [sent-298, score-0.267]
90 5 10 10 3 10 4 10 number of unlabeled points (b) #kernel expansions. [sent-299, score-0.267]
91 75 10 3 10 4 10 number of unlabeled points (c) Balanced loss. [sent-302, score-0.267]
92 7 3 10 4 10 number of unlabeled points 5 10 (d) AUC. [sent-304, score-0.267]
93 Figure 3: Results on the extended MIT face data (some abscissas and ordinates are in log scale). [sent-305, score-0.158]
94 first issue, we introduce a sparsified manifold regularizer based on the -insensitive loss. [sent-306, score-0.27]
95 For the second issue, we integrate manifold regularization with the CVM. [sent-307, score-0.249]
96 Moreover, by avoiding the underlying matrix inversion in the original LapSVM, a sparse solution can also be recovered. [sent-309, score-0.101]
97 Moreover, while the LapSVM can only handle several thousand unlabeled examples, the SLapCVM can handle one million unlabeled examples on the same machine. [sent-311, score-0.703]
98 On one data set, this produces comparable or even better performance than the (supervised) CVM trained on 900K labeled examples. [sent-312, score-0.156]
99 This clearly demonstrates the usefulness of semi-supervised learning when a large amount of unlabeled data can be utilized. [sent-313, score-0.31]
100 Manifold regularization: A geometric framework for learning from labeled and unlabeled examples. [sent-318, score-0.401]
wordName wordTfidf (topN-words)
[('lapsvm', 0.606), ('slapcvm', 0.407), ('cvm', 0.309), ('unlabeled', 0.267), ('manifold', 0.192), ('ue', 0.179), ('labeled', 0.134), ('meb', 0.114), ('reduced', 0.086), ('hk', 0.086), ('qp', 0.083), ('xve', 0.081), ('regularizer', 0.078), ('svm', 0.077), ('dual', 0.075), ('kkt', 0.072), ('yi', 0.072), ('xi', 0.072), ('xue', 0.071), ('sparsi', 0.068), ('kernel', 0.064), ('lasso', 0.062), ('regularization', 0.057), ('cpu', 0.057), ('massive', 0.057), ('core', 0.056), ('laplacian', 0.054), ('ve', 0.05), ('usps', 0.05), ('abscissas', 0.049), ('ordinates', 0.049), ('tsang', 0.049), ('expansions', 0.048), ('bonn', 0.045), ('loss', 0.045), ('sparse', 0.044), ('resultant', 0.043), ('kwok', 0.042), ('svr', 0.042), ('handle', 0.041), ('moreover', 0.039), ('generation', 0.038), ('august', 0.038), ('rkhs', 0.038), ('supervised', 0.037), ('column', 0.036), ('million', 0.035), ('primal', 0.035), ('sindhwani', 0.034), ('examples', 0.033), ('garcke', 0.033), ('jij', 0.033), ('lapsvms', 0.033), ('patns', 0.033), ('auc', 0.032), ('solution', 0.032), ('face', 0.031), ('pc', 0.031), ('transductive', 0.031), ('lk', 0.031), ('patterns', 0.031), ('complexities', 0.03), ('balanced', 0.03), ('minimizer', 0.029), ('px', 0.029), ('extended', 0.029), ('rtner', 0.028), ('faster', 0.026), ('sparser', 0.026), ('hong', 0.026), ('enclosing', 0.026), ('penalty', 0.026), ('inversion', 0.025), ('tp', 0.024), ('semisupervised', 0.024), ('kong', 0.024), ('graph', 0.024), ('norm', 0.023), ('restore', 0.023), ('inferior', 0.023), ('girosi', 0.023), ('training', 0.022), ('germany', 0.022), ('attain', 0.022), ('comparable', 0.022), ('demonstrates', 0.022), ('amount', 0.021), ('representer', 0.021), ('millions', 0.021), ('niyogi', 0.021), ('sparsity', 0.02), ('risk', 0.02), ('ij', 0.02), ('speedup', 0.02), ('figures', 0.02), ('thousand', 0.019), ('unnormalized', 0.019), ('belkin', 0.018), ('diag', 0.018), ('hand', 0.018)]
simIndex simValue paperId paperTitle
same-paper 1 0.99999982 104 nips-2006-Large-Scale Sparsified Manifold Regularization
Author: Ivor W. Tsang, James T. Kwok
Abstract: Semi-supervised learning is more powerful than supervised learning by using both labeled and unlabeled data. In particular, the manifold regularization framework, together with kernel methods, leads to the Laplacian SVM (LapSVM) that has demonstrated state-of-the-art performance. However, the LapSVM solution typically involves kernel expansions of all the labeled and unlabeled examples, and is slow on testing. Moreover, existing semi-supervised learning methods, including the LapSVM, can only handle a small number of unlabeled examples. In this paper, we integrate manifold regularization with the core vector machine, which has been used for large-scale supervised and unsupervised learning. By using a sparsified manifold regularizer and formulating as a center-constrained minimum enclosing ball problem, the proposed method produces sparse solutions with low time and space complexities. Experimental results show that it is much faster than the LapSVM, and can handle a million unlabeled examples on a standard PC; while the LapSVM can only handle several thousand patterns. 1
2 0.20450212 48 nips-2006-Branch and Bound for Semi-Supervised Support Vector Machines
Author: Olivier Chapelle, Vikas Sindhwani, S. S. Keerthi
Abstract: Semi-supervised SVMs (S3 VM) attempt to learn low-density separators by maximizing the margin over labeled and unlabeled examples. The associated optimization problem is non-convex. To examine the full potential of S3 VMs modulo local minima problems in current implementations, we apply branch and bound techniques for obtaining exact, globally optimal solutions. Empirical evidence suggests that the globally optimal solution can return excellent generalization performance in situations where other implementations fail completely. While our current implementation is only applicable to small datasets, we discuss variants that can potentially lead to practically useful algorithms. 1
3 0.15397747 150 nips-2006-On Transductive Regression
Author: Corinna Cortes, Mehryar Mohri
Abstract: In many modern large-scale learning applications, the amount of unlabeled data far exceeds that of labeled data. A common instance of this problem is the transductive setting where the unlabeled test points are known to the learning algorithm. This paper presents a study of regression problems in that setting. It presents explicit VC-dimension error bounds for transductive regression that hold for all bounded loss functions and coincide with the tight classification bounds of Vapnik when applied to classification. It also presents a new transductive regression algorithm inspired by our bound that admits a primal and kernelized closedform solution and deals efficiently with large amounts of unlabeled data. The algorithm exploits the position of unlabeled points to locally estimate their labels and then uses a global optimization to ensure robust predictions. Our study also includes the results of experiments with several publicly available regression data sets with up to 20,000 unlabeled examples. The comparison with other transductive regression algorithms shows that it performs well and that it can scale to large data sets.
4 0.11096213 84 nips-2006-Generalized Regularized Least-Squares Learning with Predefined Features in a Hilbert Space
Author: Wenye Li, Kin-hong Lee, Kwong-sak Leung
Abstract: Kernel-based regularized learning seeks a model in a hypothesis space by minimizing the empirical error and the model’s complexity. Based on the representer theorem, the solution consists of a linear combination of translates of a kernel. This paper investigates a generalized form of representer theorem for kernel-based learning. After mapping predefined features and translates of a kernel simultaneously onto a hypothesis space by a specific way of constructing kernels, we proposed a new algorithm by utilizing a generalized regularizer which leaves part of the space unregularized. Using a squared-loss function in calculating the empirical error, a simple convex solution is obtained which combines predefined features with translates of the kernel. Empirical evaluations have confirmed the effectiveness of the algorithm for supervised learning tasks.
5 0.10383414 169 nips-2006-Relational Learning with Gaussian Processes
Author: Wei Chu, Vikas Sindhwani, Zoubin Ghahramani, S. S. Keerthi
Abstract: Correlation between instances is often modelled via a kernel function using input attributes of the instances. Relational knowledge can further reveal additional pairwise correlations between variables of interest. In this paper, we develop a class of models which incorporates both reciprocal relational information and input attributes using Gaussian process techniques. This approach provides a novel non-parametric Bayesian framework with a data-dependent covariance function for supervised learning tasks. We also apply this framework to semi-supervised learning. Experimental results on several real world data sets verify the usefulness of this algorithm. 1
6 0.10168236 118 nips-2006-Learning to Model Spatial Dependency: Semi-Supervised Discriminative Random Fields
7 0.095114157 186 nips-2006-Support Vector Machines on a Budget
8 0.092381567 120 nips-2006-Learning to Traverse Image Manifolds
9 0.082267962 128 nips-2006-Manifold Denoising
10 0.069281347 28 nips-2006-An Efficient Method for Gradient-Based Adaptation of Hyperparameters in SVM Models
11 0.068127126 65 nips-2006-Denoising and Dimension Reduction in Feature Space
12 0.066722445 93 nips-2006-Hyperparameter Learning for Graph Based Semi-supervised Learning Algorithms
13 0.064798415 200 nips-2006-Unsupervised Regression with Applications to Nonlinear System Identification
14 0.064121999 60 nips-2006-Convergence of Laplacian Eigenmaps
15 0.062914491 68 nips-2006-Dirichlet-Enhanced Spam Filtering based on Biased Samples
16 0.059477933 75 nips-2006-Efficient sparse coding algorithms
17 0.058280911 151 nips-2006-On the Relation Between Low Density Separation, Spectral Clustering and Graph Cuts
18 0.055785555 51 nips-2006-Clustering Under Prior Knowledge with Application to Image Segmentation
19 0.055384517 181 nips-2006-Stability of $K$-Means Clustering
20 0.052361537 184 nips-2006-Stratification Learning: Detecting Mixed Density and Dimensionality in High Dimensional Point Clouds
topicId topicWeight
[(0, -0.168), (1, 0.088), (2, -0.015), (3, 0.097), (4, -0.023), (5, 0.123), (6, -0.086), (7, 0.028), (8, 0.0), (9, 0.007), (10, -0.09), (11, -0.235), (12, -0.042), (13, -0.049), (14, 0.047), (15, 0.106), (16, 0.103), (17, -0.097), (18, -0.139), (19, 0.035), (20, -0.059), (21, -0.228), (22, 0.065), (23, 0.108), (24, -0.038), (25, 0.109), (26, -0.002), (27, -0.044), (28, -0.172), (29, -0.081), (30, -0.062), (31, 0.021), (32, 0.057), (33, -0.043), (34, -0.007), (35, 0.05), (36, 0.0), (37, -0.061), (38, -0.056), (39, 0.086), (40, -0.034), (41, -0.014), (42, 0.024), (43, -0.022), (44, -0.014), (45, -0.03), (46, 0.157), (47, 0.013), (48, -0.031), (49, 0.021)]
simIndex simValue paperId paperTitle
same-paper 1 0.93825167 104 nips-2006-Large-Scale Sparsified Manifold Regularization
Author: Ivor W. Tsang, James T. Kwok
Abstract: Semi-supervised learning is more powerful than supervised learning by using both labeled and unlabeled data. In particular, the manifold regularization framework, together with kernel methods, leads to the Laplacian SVM (LapSVM) that has demonstrated state-of-the-art performance. However, the LapSVM solution typically involves kernel expansions of all the labeled and unlabeled examples, and is slow on testing. Moreover, existing semi-supervised learning methods, including the LapSVM, can only handle a small number of unlabeled examples. In this paper, we integrate manifold regularization with the core vector machine, which has been used for large-scale supervised and unsupervised learning. By using a sparsified manifold regularizer and formulating as a center-constrained minimum enclosing ball problem, the proposed method produces sparse solutions with low time and space complexities. Experimental results show that it is much faster than the LapSVM, and can handle a million unlabeled examples on a standard PC; while the LapSVM can only handle several thousand patterns. 1
2 0.79338658 48 nips-2006-Branch and Bound for Semi-Supervised Support Vector Machines
Author: Olivier Chapelle, Vikas Sindhwani, S. S. Keerthi
Abstract: Semi-supervised SVMs (S3 VM) attempt to learn low-density separators by maximizing the margin over labeled and unlabeled examples. The associated optimization problem is non-convex. To examine the full potential of S3 VMs modulo local minima problems in current implementations, we apply branch and bound techniques for obtaining exact, globally optimal solutions. Empirical evidence suggests that the globally optimal solution can return excellent generalization performance in situations where other implementations fail completely. While our current implementation is only applicable to small datasets, we discuss variants that can potentially lead to practically useful algorithms. 1
3 0.72893906 150 nips-2006-On Transductive Regression
Author: Corinna Cortes, Mehryar Mohri
Abstract: In many modern large-scale learning applications, the amount of unlabeled data far exceeds that of labeled data. A common instance of this problem is the transductive setting where the unlabeled test points are known to the learning algorithm. This paper presents a study of regression problems in that setting. It presents explicit VC-dimension error bounds for transductive regression that hold for all bounded loss functions and coincide with the tight classification bounds of Vapnik when applied to classification. It also presents a new transductive regression algorithm inspired by our bound that admits a primal and kernelized closedform solution and deals efficiently with large amounts of unlabeled data. The algorithm exploits the position of unlabeled points to locally estimate their labels and then uses a global optimization to ensure robust predictions. Our study also includes the results of experiments with several publicly available regression data sets with up to 20,000 unlabeled examples. The comparison with other transductive regression algorithms shows that it performs well and that it can scale to large data sets.
4 0.57499576 118 nips-2006-Learning to Model Spatial Dependency: Semi-Supervised Discriminative Random Fields
Author: Chi-hoon Lee, Shaojun Wang, Feng Jiao, Dale Schuurmans, Russell Greiner
Abstract: We present a novel, semi-supervised approach to training discriminative random fields (DRFs) that efficiently exploits labeled and unlabeled training data to achieve improved accuracy in a variety of image processing tasks. We formulate DRF training as a form of MAP estimation that combines conditional loglikelihood on labeled data, given a data-dependent prior, with a conditional entropy regularizer defined on unlabeled data. Although the training objective is no longer concave, we develop an efficient local optimization procedure that produces classifiers that are more accurate than ones based on standard supervised DRF training. We then apply our semi-supervised approach to train DRFs to segment both synthetic and real data sets, and demonstrate significant improvements over supervised DRFs in each case.
5 0.5488168 93 nips-2006-Hyperparameter Learning for Graph Based Semi-supervised Learning Algorithms
Author: Xinhua Zhang, Wee S. Lee
Abstract: Semi-supervised learning algorithms have been successfully applied in many applications with scarce labeled data, by utilizing the unlabeled data. One important category is graph based semi-supervised learning algorithms, for which the performance depends considerably on the quality of the graph, or its hyperparameters. In this paper, we deal with the less explored problem of learning the graphs. We propose a graph learning method for the harmonic energy minimization method; this is done by minimizing the leave-one-out prediction error on labeled data points. We use a gradient based method and designed an efficient algorithm which significantly accelerates the calculation of the gradient by applying the matrix inversion lemma and using careful pre-computation. Experimental results show that the graph learning method is effective in improving the performance of the classification algorithm. 1
6 0.44536859 28 nips-2006-An Efficient Method for Gradient-Based Adaptation of Hyperparameters in SVM Models
7 0.44417667 84 nips-2006-Generalized Regularized Least-Squares Learning with Predefined Features in a Hilbert Space
8 0.43934229 120 nips-2006-Learning to Traverse Image Manifolds
9 0.4321107 129 nips-2006-Map-Reduce for Machine Learning on Multicore
10 0.42157772 186 nips-2006-Support Vector Machines on a Budget
11 0.40251249 101 nips-2006-Isotonic Conditional Random Fields and Local Sentiment Flow
12 0.39063567 82 nips-2006-Gaussian and Wishart Hyperkernels
13 0.3837668 136 nips-2006-Multi-Instance Multi-Label Learning with Application to Scene Classification
14 0.37837824 128 nips-2006-Manifold Denoising
15 0.37554279 169 nips-2006-Relational Learning with Gaussian Processes
16 0.34465599 68 nips-2006-Dirichlet-Enhanced Spam Filtering based on Biased Samples
17 0.32699016 200 nips-2006-Unsupervised Regression with Applications to Nonlinear System Identification
18 0.29973602 51 nips-2006-Clustering Under Prior Knowledge with Application to Image Segmentation
19 0.2726225 63 nips-2006-Cross-Validation Optimization for Large Scale Hierarchical Classification Kernel Methods
20 0.26992604 105 nips-2006-Large Margin Component Analysis
topicId topicWeight
[(1, 0.126), (3, 0.417), (7, 0.09), (9, 0.033), (22, 0.056), (44, 0.038), (57, 0.05), (64, 0.013), (65, 0.034), (69, 0.027), (72, 0.01)]
simIndex simValue paperId paperTitle
1 0.94983029 200 nips-2006-Unsupervised Regression with Applications to Nonlinear System Identification
Author: Ali Rahimi, Ben Recht
Abstract: We derive a cost functional for estimating the relationship between highdimensional observations and the low-dimensional process that generated them with no input-output examples. Limiting our search to invertible observation functions confers numerous benefits, including a compact representation and no suboptimal local minima. Our approximation algorithms for optimizing this cost functional are fast and give diagnostic bounds on the quality of their solution. Our method can be viewed as a manifold learning algorithm that utilizes a prior on the low-dimensional manifold coordinates. The benefits of taking advantage of such priors in manifold learning and searching for the inverse observation functions in system identification are demonstrated empirically by learning to track moving targets from raw measurements in a sensor network setting and in an RFID tracking experiment. 1
2 0.9353804 120 nips-2006-Learning to Traverse Image Manifolds
Author: Piotr DollĂĄr, Vincent Rabaud, Serge J. Belongie
Abstract: We present a new algorithm, Locally Smooth Manifold Learning (LSML), that learns a warping function from a point on an manifold to its neighbors. Important characteristics of LSML include the ability to recover the structure of the manifold in sparsely populated regions and beyond the support of the provided data. Applications of our proposed technique include embedding with a natural out-of-sample extension and tasks such as tangent distance estimation, frame rate up-conversion, video compression and motion transfer. 1
3 0.90123582 17 nips-2006-A recipe for optimizing a time-histogram
Author: Hideaki Shimazaki, Shigeru Shinomoto
Abstract: The time-histogram method is a handy tool for capturing the instantaneous rate of spike occurrence. In most of the neurophysiological literature, the bin size that critically determines the goodness of the fit of the time-histogram to the underlying rate has been selected by individual researchers in an unsystematic manner. We propose an objective method for selecting the bin size of a time-histogram from the spike data, so that the time-histogram best approximates the unknown underlying rate. The resolution of the histogram increases, or the optimal bin size decreases, with the number of spike sequences sampled. It is notable that the optimal bin size diverges if only a small number of experimental trials are available from a moderately fluctuating rate process. In this case, any attempt to characterize the underlying spike rate will lead to spurious results. Given a paucity of data, our method can also suggest how many more trials are needed until the set of data can be analyzed with the required resolution. 1
same-paper 4 0.85899556 104 nips-2006-Large-Scale Sparsified Manifold Regularization
Author: Ivor W. Tsang, James T. Kwok
Abstract: Semi-supervised learning is more powerful than supervised learning by using both labeled and unlabeled data. In particular, the manifold regularization framework, together with kernel methods, leads to the Laplacian SVM (LapSVM) that has demonstrated state-of-the-art performance. However, the LapSVM solution typically involves kernel expansions of all the labeled and unlabeled examples, and is slow on testing. Moreover, existing semi-supervised learning methods, including the LapSVM, can only handle a small number of unlabeled examples. In this paper, we integrate manifold regularization with the core vector machine, which has been used for large-scale supervised and unsupervised learning. By using a sparsified manifold regularizer and formulating as a center-constrained minimum enclosing ball problem, the proposed method produces sparse solutions with low time and space complexities. Experimental results show that it is much faster than the LapSVM, and can handle a million unlabeled examples on a standard PC; while the LapSVM can only handle several thousand patterns. 1
5 0.59051287 127 nips-2006-MLLE: Modified Locally Linear Embedding Using Multiple Weights
Author: Zhenyue Zhang, Jing Wang
Abstract: The locally linear embedding (LLE) is improved by introducing multiple linearly independent local weight vectors for each neighborhood. We characterize the reconstruction weights and show the existence of the linearly independent weight vectors at each neighborhood. The modified locally linear embedding (MLLE) proposed in this paper is much stable. It can retrieve the ideal embedding if MLLE is applied on data points sampled from an isometric manifold. MLLE is also compared with the local tangent space alignment (LTSA). Numerical examples are given that show the improvement and efficiency of MLLE. 1
6 0.58255571 87 nips-2006-Graph Laplacian Regularization for Large-Scale Semidefinite Programming
7 0.5702619 128 nips-2006-Manifold Denoising
8 0.56371218 106 nips-2006-Large Margin Hidden Markov Models for Automatic Speech Recognition
9 0.56254989 184 nips-2006-Stratification Learning: Detecting Mixed Density and Dimensionality in High Dimensional Point Clouds
10 0.559228 121 nips-2006-Learning to be Bayesian without Supervision
11 0.54479259 153 nips-2006-Online Clustering of Moving Hyperplanes
12 0.54173809 48 nips-2006-Branch and Bound for Semi-Supervised Support Vector Machines
13 0.53346407 15 nips-2006-A Switched Gaussian Process for Estimating Disparity and Segmentation in Binocular Stereo
14 0.52915943 95 nips-2006-Implicit Surfaces with Globally Regularised and Compactly Supported Basis Functions
15 0.52619416 107 nips-2006-Large Margin Multi-channel Analog-to-Digital Conversion with Applications to Neural Prosthesis
16 0.52182633 67 nips-2006-Differential Entropic Clustering of Multivariate Gaussians
17 0.5143159 108 nips-2006-Large Scale Hidden Semi-Markov SVMs
18 0.51125389 55 nips-2006-Computation of Similarity Measures for Sequential Data using Generalized Suffix Trees
19 0.50906402 162 nips-2006-Predicting spike times from subthreshold dynamics of a neuron
20 0.50707847 105 nips-2006-Large Margin Component Analysis