nips nips2006 nips2006-104 knowledge-graph by maker-knowledge-mining

104 nips-2006-Large-Scale Sparsified Manifold Regularization

Source: pdf

Author: Ivor W. Tsang, James T. Kwok

Abstract: Semi-supervised learning is more powerful than supervised learning by using both labeled and unlabeled data. In particular, the manifold regularization framework, together with kernel methods, leads to the Laplacian SVM (LapSVM) that has demonstrated state-of-the-art performance. However, the LapSVM solution typically involves kernel expansions of all the labeled and unlabeled examples, and is slow on testing. Moreover, existing semi-supervised learning methods, including the LapSVM, can only handle a small number of unlabeled examples. In this paper, we integrate manifold regularization with the core vector machine, which has been used for large-scale supervised and unsupervised learning. By using a sparsiﬁed manifold regularizer and formulating as a center-constrained minimum enclosing ball problem, the proposed method produces sparse solutions with low time and space complexities. Experimental results show that it is much faster than the LapSVM, and can handle a million unlabeled examples on a standard PC; while the LapSVM can only handle several thousand patterns. 1

Reference: text

Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 hk Abstract Semi-supervised learning is more powerful than supervised learning by using both labeled and unlabeled data. [sent-5, score-0.524]

2 In particular, the manifold regularization framework, together with kernel methods, leads to the Laplacian SVM (LapSVM) that has demonstrated state-of-the-art performance. [sent-6, score-0.313]

3 However, the LapSVM solution typically involves kernel expansions of all the labeled and unlabeled examples, and is slow on testing. [sent-7, score-0.545]

4 Moreover, existing semi-supervised learning methods, including the LapSVM, can only handle a small number of unlabeled examples. [sent-8, score-0.308]

5 In this paper, we integrate manifold regularization with the core vector machine, which has been used for large-scale supervised and unsupervised learning. [sent-9, score-0.342]

6 By using a sparsiﬁed manifold regularizer and formulating as a center-constrained minimum enclosing ball problem, the proposed method produces sparse solutions with low time and space complexities. [sent-10, score-0.34]

7 Experimental results show that it is much faster than the LapSVM, and can handle a million unlabeled examples on a standard PC; while the LapSVM can only handle several thousand patterns. [sent-11, score-0.462]

8 1 Introduction In many real-world applications, collection of labeled data is both time-consuming and expensive. [sent-12, score-0.134]

9 On the other hand, a large amount of unlabeled data are often readily available. [sent-13, score-0.288]

10 While traditional supervised learning methods can only learn from the limited amount of labeled data, semi-supervised learning [2] aims at improving the generalization performance by utilizing both the labeled and unlabeled data. [sent-14, score-0.593]

11 When the data lie on a manifold, it is common to approximate this manifold by a weighted graph, leading to graph-based semi-supervised learning methods. [sent-17, score-0.192]

12 In this paper, we focus on the manifold regularization frameo work proposed in [1]. [sent-20, score-0.249]

13 By deﬁning a data-dependent reproducing kernel Hilbert space (RKHS), manifold regularization incorporates an additional regularizer to ensure that the learned function is smooth on the manifold. [sent-21, score-0.391]

14 Moreover, while the original motivation of semi-supervised learning is to utilize the large amount of unlabeled data available, existing algorithms are only capable of handling a small to moderate amount of unlabeled data. [sent-25, score-0.576]

15 [9] speeded up manifold regularization by restraining to linear models, which, however, may not be ﬂexible enough for complicated target functions. [sent-28, score-0.249]

16 As reported in a recent survey [14], most semisupervised learning methods can only handle 100 – 10,000 unlabeled examples. [sent-31, score-0.332]

17 The largest graph a they worked with involve 75,888 labeled and unlabeled examples. [sent-34, score-0.425]

18 Thus, no one has ever been experimented on massive data sets with, say, one million unlabeled examples. [sent-35, score-0.359]

19 On the other hand, the Core Vector Machine (CVM) is recently proposed for scaling up kernel methods in both supervised (including classiﬁcation [12] and regression [13]) and unsupervised learning (e. [sent-36, score-0.101]

20 Experimental results on real world data sets with millions of patterns demonstrated that the CVM is much faster than existing SVM implementations and can handle much larger data sets. [sent-41, score-0.119]

21 To restore sparsity of the LapSVM solution, we ﬁrst introduce a sparsiﬁed manifold regularizer based on the -insensitive loss. [sent-43, score-0.313]

22 In Section 2, we ﬁrst give a brief review on manifold regularization. [sent-47, score-0.192]

23 2 Manifold Regularization Given a training set {(xi , yi )}m with input xi ∈ X and output yi ∈ R. [sent-50, score-0.238]

24 The regularized risk i=1 functional is the sum of the empirical risk (corresponding to a loss function ) and a regularizer Ω. [sent-51, score-0.163]

25 Given a kernel k and its RKHS Hk , we minimize the regularized risk over function f in Hk : min f ∈Hk 1 m m (xi , yi , f (xi )) + λΩ( f (1) Hk ). [sent-52, score-0.156]

26 In semi-supervised learning, we have both labeled examples {(xi , yi )}m and unlabeled examples i=1 {xi }m+n . [sent-56, score-0.539]

27 Manifold regularization uses an additional regularizer f 2 to ensure that the function I i=m+1 f is smooth on the intrinsic structure of the input. [sent-57, score-0.135]

28 It is common to approximate I this manifold by a weighted graph deﬁned on all the labeled and unlabeled data, as G = (V, E) with V and E being the sets of vertices and edges respectively. [sent-62, score-0.617]

29 Then, f 2 is approximated as1 I f 2 I w(ue , ve ) = e∈E 1 f (xue ) f (xve ) − s(ue ) s(ve ) 2 , (3) When the set of labeled and unlabeled data is small, a function that is smooth on this small set may not be interesting. [sent-65, score-0.451]

30 where ue and ve are vertices of the edge e, and s(u) = d(u) when the normalized graph Laplacian is used, and s(u) = 1 with the unnormalized one. [sent-67, score-0.272]

31 As shown in [1], the minimizer of (2) becomes m+n f (x) = i=1 αi k(xi , x), which depends on both labeled and unlabeled examples. [sent-68, score-0.43]

32 1 Laplacian SVM Using the hinge loss (xi , yi , f (xi )) = max(0, 1 − yi f (xi )) in (2), we obtain the Laplacian SVM (LapSVM) [1]. [sent-70, score-0.189]

33 , 1] , Qm×m = YJK(2λI + 2λI LK) J Y, Ym×m is the diagonal matrix with Yii = yi , K(m+n)×(m+n) is the kernel matrix over both the labeled and unlabeled data, L(m+n)×(m+n) is the graph Laplacian, and Jm×(m+n) with Jij = 1 if i = j and xi is a labeled example, and Jij = 0 otherwise. [sent-79, score-0.767]

34 Substituting (4) into (2), we have: m min f ∈Hk 1 (xi , yi , f (xi ))+λI m i=1 w(ue , ve ) e∈E f (xue ) f (xve ) − s(ue ) s(ve ) 2 +λΩ( f Hk ). [sent-91, score-0.122]

35 ε ¯ By treating the terms inside the braces as the “loss function”, this can be regarded as regularized risk minimization and, using the standard representer theorem, the minimizer f then admits the form m+n f (x) = i=1 αi k(xi , x), same as that of the original manifold regularization. [sent-92, score-0.262]

36 The primal 2 I = 2 ∗ (ζe + ζe 2 ) where of the (5) e∈E yi (w ϕ(xi ) + b) ≥ 1 − ε − ξi , i = 1, . [sent-96, score-0.107]

37 Here, −U |E|µ U + Cθ I K is the kernel matrix deﬁned using kernel k on the m labeled examples, U|E|×|E| = [ψ e ψ f + τe τf ], and Vm×|E| = [yi ϕ(xi ) ψ e + τe ]. [sent-116, score-0.262]

38 1) requires O((m + n)2 ) kernel k(xi , xj ) evaluations, each entry in K here takes only O(1) kernel evaluations. [sent-118, score-0.128]

39 Moreover, the primal vari¯ ables can be easily recovered from the dual variables by the KKT conditions. [sent-122, score-0.11]

40 In particular, m m ∗ ∗ w = C i=1 βi yi ϕ(xi ) + e∈E (γe − γe )ψ e and b = C i=1 βi yi + e∈E (γe − γe )τe . [sent-123, score-0.144]

41 Subsequently, the decision function f (x) = w ϕ(x) + b is a linear combination of k(xi , x)’s deﬁned on both the labeled and unlabeled examples, as in standard manifold regularization. [sent-124, score-0.593]

42 ˜ ˜ ˜˜ ˜ ˜ The above formulation can be easily extended to the regression case, with the pattern output changed from ±1 to yi ∈ R, and the hinge loss replaced by the -insensitive loss. [sent-132, score-0.146]

43 2 that the -insensitive loss achieves a similar effect as the 1 penalty in LASSO [11], which is known to produce sparse approximation. [sent-140, score-0.115]

44 Similarly, manifold regularization ﬁnds a f that is locally smooth. [sent-148, score-0.249]

45 While sparse approximation techniques such as basis pursuit typically use the L2 norm for the error, Girosi argued that the norm of the RKHS Hk is a better measure of smoothness. [sent-163, score-0.09]

46 K First, consider the simpler case where the manifold regularizer is replaced by a simple regularizer α 2 . [sent-172, score-0.348]

47 On the other hand, consider the following variant of SVR m 2 ∗2 ∗ (ξi + ξi ) + 2C ε : yi − w ϕ(xi ) ≤ ε + ξi , w ϕ(xi ) − yi ≤ ε + ξi . [sent-177, score-0.144]

48 (12) ¯ ¯ ¯ i=1 It can be shown that its dual is identical to (11), with β, β ∗ as dual variables. [sent-178, score-0.15]

49 As above, the key steps are on replacing the 2 norm by the kernel PCA map, and adding a 1 penalty on the variables. [sent-185, score-0.113]

50 It can then be shown that sparsiﬁed manifold regularizer (based on the -insensitive loss) can again be recovered by using the LASSO penalty. [sent-186, score-0.27]

51 (Here, we ignore O(m+|E|) space required for storing the m training patterns and 2|E| edge constraints, as these may be stored outside the core memory. [sent-191, score-0.109]

52 ) They are thus independent of the numbers of labeled and unlabeled examples for a ﬁxed . [sent-192, score-0.434]

53 This “reduced LapSVM” solves a smaller optimization problem that involves a random r×(m+n) rectangular subset of the kernel matrix, where the r patterns are chosen from both the labeled and unlabeled data. [sent-197, score-0.496]

54 Moreover, both can be regarded as primal methods that maintain primal4 feasibility and work towards dual feasibility. [sent-202, score-0.11]

55 Also, as is typical in column generation, the dual variable whose KKT condition is most violated is added at each iteration. [sent-203, score-0.111]

56 Note that each dual variable then corresponds to a training pattern. [sent-208, score-0.097]

57 Instead of requiring the dual solution to be strictly feasible, CVM only requires it to be feasible within a factor of (1 + ). [sent-213, score-0.107]

58 This, together with the fact that its dual is a MEB problem, allows its number of iterations for convergence to be bounded and thus the total time and space complexities guaranteed. [sent-214, score-0.105]

59 By regarding the CVM as the approximation algorithm counterpart of column generation, this suggests that the CVM can also be used in the same way as column generation in speeding up other optimization problems. [sent-216, score-0.11]

60 The graph (for the manifold) is constructed by using the 6 nearest neighbors of each pattern, and the weight w(u e , ve ) 1 in (3) is deﬁned as exp(− xue − xve 2 /βg ), where βg = |E| e∈E xeu − xev 2 . [sent-223, score-0.226]

61 Unless otherwise speciﬁed, m 1 ¯ we use the Gaussian kernel exp(− x − z 2 /β), with β = m i=1 xi − x 2 . [sent-226, score-0.136]

62 To better illustrate the scaling behavior, we vary the number of unlabeled patterns used for training (from 1, 000 up to a maximum of 1 million). [sent-234, score-0.32]

63 4 3 SLapCVM LapSVM Reduced LapSVM 2 10 number of kernel expansions CPU time (in seconds) 10 1 10 0 10 −1 10 (a) Data distribution. [sent-238, score-0.112]

64 3 10 4 5 10 10 number of unlabeled points (c) CPU time. [sent-240, score-0.267]

65 6 10 10 SLapCVM core−set Size LapSVM Reduced LapSVM 3 10 2 10 3 10 4 5 10 10 number of unlabeled points (d) #kernel sions. [sent-241, score-0.267]

66 6 10 expan- Figure 1: Results on the two-moons data set (some abscissas and ordinates are in log scale). [sent-242, score-0.098]

67 The two labeled examples are labeled in red in Figure 1(a). [sent-243, score-0.301]

68 Both the LapSVM and SLapCVM always attain 100% accuracy on the test set, even with only two labeled examples (Figure 1(b)). [sent-245, score-0.189]

69 and all the labeled and unlabeled examples are involved in the solution (Figure 1(d))). [sent-257, score-0.466]

70 As can be seen from Figures 1(c) and 1(d), both the time and space required by the SLapCVM are almost constant, even when the unlabeled data set gets very large. [sent-259, score-0.267]

71 The reduced LapSVM, though also fast, is slightly inferior to both the SLapCVM and LapSVM. [sent-260, score-0.109]

72 One labeled example is randomly sampled from each class for training. [sent-264, score-0.134]

73 To achieve comparable accuracy, we use r = 2, 000 for the reduced LapSVM. [sent-265, score-0.108]

74 For comparison, we also train a standard SVM with the two labeled examples. [sent-266, score-0.134]

75 For the SLapCVM, both the time required and number of kernel expansions involved grow only sublinearly with the number of unlabeled examples. [sent-269, score-0.379]

76 Figure 2(c) demonstrates that semi-supervised learning (using either the LapSVMs or SLapCVM) can have much better generalization performance than supervised learning using the labeled examples only. [sent-270, score-0.226]

77 On the other hand, the reduced LapSVM has comparable speed with the SLapCVM, but its performance is inferior and cannot handle large data sets. [sent-272, score-0.172]

78 10 3 10 4 5 10 10 number of unlabeled points (b) #kernel expansions. [sent-274, score-0.267]

79 6 10 0 3 10 4 5 10 10 number of unlabeled points 6 10 (c) Test error. [sent-275, score-0.267]

80 Figure 2: Results on the extended USPS data set (some abscissas and ordinates are in log scale). [sent-276, score-0.127]

81 3 Extended MIT Face Data Set In this section, we perform face detection using the extended MIT face database in [12]. [sent-278, score-0.091]

82 Five labeled example are randomly sampled from each class and used in training. [sent-279, score-0.134]

83 For comparison, we also train two SVMs: one uses the 10 labeled examples only while the other uses all the labeled examples (a total of 889,986) in the original training set of [12]. [sent-284, score-0.356]

84 Note that the SLapCVM, using only 10 labeled examples, can attain comparable AUC and even better balanced loss than the SVM trained on the original, massive training set (Figures 3(c) and 3(d)). [sent-287, score-0.332]

85 This clearly demonstrates the usefulness of semi-supervised learning when a large amount of unlabeled data can be utilized. [sent-288, score-0.31]

86 On the other hand, note that the LapSVM again cannot be run with more than 3,000 unlabeled examples on our PC because of its high space requirement. [sent-289, score-0.3]

87 2) How to handle data sets with millions of unlabeled examples? [sent-292, score-0.329]

88 For the number of kernel expansions CPU time (in seconds) 10 50 3 10 2 10 SLapCVM core−set Size LapSVM Reduced LapSVM 45 4 10 3 10 40 35 1. [sent-293, score-0.112]

89 85 0 10 3 10 2 4 10 number of unlabeled points (a) CPU time. [sent-298, score-0.267]

90 5 10 10 3 10 4 10 number of unlabeled points (b) #kernel expansions. [sent-299, score-0.267]

91 75 10 3 10 4 10 number of unlabeled points (c) Balanced loss. [sent-302, score-0.267]

92 7 3 10 4 10 number of unlabeled points 5 10 (d) AUC. [sent-304, score-0.267]

93 Figure 3: Results on the extended MIT face data (some abscissas and ordinates are in log scale). [sent-305, score-0.158]

94 ﬁrst issue, we introduce a sparsiﬁed manifold regularizer based on the -insensitive loss. [sent-306, score-0.27]

95 For the second issue, we integrate manifold regularization with the CVM. [sent-307, score-0.249]

96 Moreover, by avoiding the underlying matrix inversion in the original LapSVM, a sparse solution can also be recovered. [sent-309, score-0.101]

97 Moreover, while the LapSVM can only handle several thousand unlabeled examples, the SLapCVM can handle one million unlabeled examples on the same machine. [sent-311, score-0.703]

98 On one data set, this produces comparable or even better performance than the (supervised) CVM trained on 900K labeled examples. [sent-312, score-0.156]

99 This clearly demonstrates the usefulness of semi-supervised learning when a large amount of unlabeled data can be utilized. [sent-313, score-0.31]

100 Manifold regularization: A geometric framework for learning from labeled and unlabeled examples. [sent-318, score-0.401]

similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('lapsvm', 0.606), ('slapcvm', 0.407), ('cvm', 0.309), ('unlabeled', 0.267), ('manifold', 0.192), ('ue', 0.179), ('labeled', 0.134), ('meb', 0.114), ('reduced', 0.086), ('hk', 0.086), ('qp', 0.083), ('xve', 0.081), ('regularizer', 0.078), ('svm', 0.077), ('dual', 0.075), ('kkt', 0.072), ('yi', 0.072), ('xi', 0.072), ('xue', 0.071), ('sparsi', 0.068), ('kernel', 0.064), ('lasso', 0.062), ('regularization', 0.057), ('cpu', 0.057), ('massive', 0.057), ('core', 0.056), ('laplacian', 0.054), ('ve', 0.05), ('usps', 0.05), ('abscissas', 0.049), ('ordinates', 0.049), ('tsang', 0.049), ('expansions', 0.048), ('bonn', 0.045), ('loss', 0.045), ('sparse', 0.044), ('resultant', 0.043), ('kwok', 0.042), ('svr', 0.042), ('handle', 0.041), ('moreover', 0.039), ('generation', 0.038), ('august', 0.038), ('rkhs', 0.038), ('supervised', 0.037), ('column', 0.036), ('million', 0.035), ('primal', 0.035), ('sindhwani', 0.034), ('examples', 0.033), ('garcke', 0.033), ('jij', 0.033), ('lapsvms', 0.033), ('patns', 0.033), ('auc', 0.032), ('solution', 0.032), ('face', 0.031), ('pc', 0.031), ('transductive', 0.031), ('lk', 0.031), ('patterns', 0.031), ('complexities', 0.03), ('balanced', 0.03), ('minimizer', 0.029), ('px', 0.029), ('extended', 0.029), ('rtner', 0.028), ('faster', 0.026), ('sparser', 0.026), ('hong', 0.026), ('enclosing', 0.026), ('penalty', 0.026), ('inversion', 0.025), ('tp', 0.024), ('semisupervised', 0.024), ('kong', 0.024), ('graph', 0.024), ('norm', 0.023), ('restore', 0.023), ('inferior', 0.023), ('girosi', 0.023), ('training', 0.022), ('germany', 0.022), ('attain', 0.022), ('comparable', 0.022), ('demonstrates', 0.022), ('amount', 0.021), ('representer', 0.021), ('millions', 0.021), ('niyogi', 0.021), ('sparsity', 0.02), ('risk', 0.02), ('ij', 0.02), ('speedup', 0.02), ('figures', 0.02), ('thousand', 0.019), ('unnormalized', 0.019), ('belkin', 0.018), ('diag', 0.018), ('hand', 0.018)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.99999982 104 nips-2006-Large-Scale Sparsified Manifold Regularization

Author: Ivor W. Tsang, James T. Kwok

2 0.20450212 48 nips-2006-Branch and Bound for Semi-Supervised Support Vector Machines

Author: Olivier Chapelle, Vikas Sindhwani, S. S. Keerthi

Abstract: Semi-supervised SVMs (S3 VM) attempt to learn low-density separators by maximizing the margin over labeled and unlabeled examples. The associated optimization problem is non-convex. To examine the full potential of S3 VMs modulo local minima problems in current implementations, we apply branch and bound techniques for obtaining exact, globally optimal solutions. Empirical evidence suggests that the globally optimal solution can return excellent generalization performance in situations where other implementations fail completely. While our current implementation is only applicable to small datasets, we discuss variants that can potentially lead to practically useful algorithms. 1

3 0.15397747 150 nips-2006-On Transductive Regression

Author: Corinna Cortes, Mehryar Mohri

Abstract: In many modern large-scale learning applications, the amount of unlabeled data far exceeds that of labeled data. A common instance of this problem is the transductive setting where the unlabeled test points are known to the learning algorithm. This paper presents a study of regression problems in that setting. It presents explicit VC-dimension error bounds for transductive regression that hold for all bounded loss functions and coincide with the tight classiﬁcation bounds of Vapnik when applied to classiﬁcation. It also presents a new transductive regression algorithm inspired by our bound that admits a primal and kernelized closedform solution and deals efﬁciently with large amounts of unlabeled data. The algorithm exploits the position of unlabeled points to locally estimate their labels and then uses a global optimization to ensure robust predictions. Our study also includes the results of experiments with several publicly available regression data sets with up to 20,000 unlabeled examples. The comparison with other transductive regression algorithms shows that it performs well and that it can scale to large data sets.

4 0.11096213 84 nips-2006-Generalized Regularized Least-Squares Learning with Predefined Features in a Hilbert Space

Author: Wenye Li, Kin-hong Lee, Kwong-sak Leung

Abstract: Kernel-based regularized learning seeks a model in a hypothesis space by minimizing the empirical error and the model’s complexity. Based on the representer theorem, the solution consists of a linear combination of translates of a kernel. This paper investigates a generalized form of representer theorem for kernel-based learning. After mapping predeﬁned features and translates of a kernel simultaneously onto a hypothesis space by a speciﬁc way of constructing kernels, we proposed a new algorithm by utilizing a generalized regularizer which leaves part of the space unregularized. Using a squared-loss function in calculating the empirical error, a simple convex solution is obtained which combines predeﬁned features with translates of the kernel. Empirical evaluations have conﬁrmed the effectiveness of the algorithm for supervised learning tasks.

5 0.10383414 169 nips-2006-Relational Learning with Gaussian Processes

Author: Wei Chu, Vikas Sindhwani, Zoubin Ghahramani, S. S. Keerthi

Abstract: Correlation between instances is often modelled via a kernel function using input attributes of the instances. Relational knowledge can further reveal additional pairwise correlations between variables of interest. In this paper, we develop a class of models which incorporates both reciprocal relational information and input attributes using Gaussian process techniques. This approach provides a novel non-parametric Bayesian framework with a data-dependent covariance function for supervised learning tasks. We also apply this framework to semi-supervised learning. Experimental results on several real world data sets verify the usefulness of this algorithm. 1

6 0.10168236 118 nips-2006-Learning to Model Spatial Dependency: Semi-Supervised Discriminative Random Fields

7 0.095114157 186 nips-2006-Support Vector Machines on a Budget

8 0.092381567 120 nips-2006-Learning to Traverse Image Manifolds

9 0.082267962 128 nips-2006-Manifold Denoising

10 0.069281347 28 nips-2006-An Efficient Method for Gradient-Based Adaptation of Hyperparameters in SVM Models

11 0.068127126 65 nips-2006-Denoising and Dimension Reduction in Feature Space

12 0.066722445 93 nips-2006-Hyperparameter Learning for Graph Based Semi-supervised Learning Algorithms

13 0.064798415 200 nips-2006-Unsupervised Regression with Applications to Nonlinear System Identification

14 0.064121999 60 nips-2006-Convergence of Laplacian Eigenmaps

15 0.062914491 68 nips-2006-Dirichlet-Enhanced Spam Filtering based on Biased Samples

16 0.059477933 75 nips-2006-Efficient sparse coding algorithms

17 0.058280911 151 nips-2006-On the Relation Between Low Density Separation, Spectral Clustering and Graph Cuts

18 0.055785555 51 nips-2006-Clustering Under Prior Knowledge with Application to Image Segmentation

19 0.055384517 181 nips-2006-Stability of $K$-Means Clustering

20 0.052361537 184 nips-2006-Stratification Learning: Detecting Mixed Density and Dimensionality in High Dimensional Point Clouds

similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, -0.168), (1, 0.088), (2, -0.015), (3, 0.097), (4, -0.023), (5, 0.123), (6, -0.086), (7, 0.028), (8, 0.0), (9, 0.007), (10, -0.09), (11, -0.235), (12, -0.042), (13, -0.049), (14, 0.047), (15, 0.106), (16, 0.103), (17, -0.097), (18, -0.139), (19, 0.035), (20, -0.059), (21, -0.228), (22, 0.065), (23, 0.108), (24, -0.038), (25, 0.109), (26, -0.002), (27, -0.044), (28, -0.172), (29, -0.081), (30, -0.062), (31, 0.021), (32, 0.057), (33, -0.043), (34, -0.007), (35, 0.05), (36, 0.0), (37, -0.061), (38, -0.056), (39, 0.086), (40, -0.034), (41, -0.014), (42, 0.024), (43, -0.022), (44, -0.014), (45, -0.03), (46, 0.157), (47, 0.013), (48, -0.031), (49, 0.021)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.93825167 104 nips-2006-Large-Scale Sparsified Manifold Regularization

Author: Ivor W. Tsang, James T. Kwok

2 0.79338658 48 nips-2006-Branch and Bound for Semi-Supervised Support Vector Machines

Author: Olivier Chapelle, Vikas Sindhwani, S. S. Keerthi

3 0.72893906 150 nips-2006-On Transductive Regression

Author: Corinna Cortes, Mehryar Mohri

4 0.57499576 118 nips-2006-Learning to Model Spatial Dependency: Semi-Supervised Discriminative Random Fields

Author: Chi-hoon Lee, Shaojun Wang, Feng Jiao, Dale Schuurmans, Russell Greiner

Abstract: We present a novel, semi-supervised approach to training discriminative random ﬁelds (DRFs) that efﬁciently exploits labeled and unlabeled training data to achieve improved accuracy in a variety of image processing tasks. We formulate DRF training as a form of MAP estimation that combines conditional loglikelihood on labeled data, given a data-dependent prior, with a conditional entropy regularizer deﬁned on unlabeled data. Although the training objective is no longer concave, we develop an efﬁcient local optimization procedure that produces classiﬁers that are more accurate than ones based on standard supervised DRF training. We then apply our semi-supervised approach to train DRFs to segment both synthetic and real data sets, and demonstrate signiﬁcant improvements over supervised DRFs in each case.

5 0.5488168 93 nips-2006-Hyperparameter Learning for Graph Based Semi-supervised Learning Algorithms

Author: Xinhua Zhang, Wee S. Lee

Abstract: Semi-supervised learning algorithms have been successfully applied in many applications with scarce labeled data, by utilizing the unlabeled data. One important category is graph based semi-supervised learning algorithms, for which the performance depends considerably on the quality of the graph, or its hyperparameters. In this paper, we deal with the less explored problem of learning the graphs. We propose a graph learning method for the harmonic energy minimization method; this is done by minimizing the leave-one-out prediction error on labeled data points. We use a gradient based method and designed an efﬁcient algorithm which significantly accelerates the calculation of the gradient by applying the matrix inversion lemma and using careful pre-computation. Experimental results show that the graph learning method is effective in improving the performance of the classiﬁcation algorithm. 1

6 0.44536859 28 nips-2006-An Efficient Method for Gradient-Based Adaptation of Hyperparameters in SVM Models

7 0.44417667 84 nips-2006-Generalized Regularized Least-Squares Learning with Predefined Features in a Hilbert Space

8 0.43934229 120 nips-2006-Learning to Traverse Image Manifolds

9 0.4321107 129 nips-2006-Map-Reduce for Machine Learning on Multicore

10 0.42157772 186 nips-2006-Support Vector Machines on a Budget

11 0.40251249 101 nips-2006-Isotonic Conditional Random Fields and Local Sentiment Flow

12 0.39063567 82 nips-2006-Gaussian and Wishart Hyperkernels

13 0.3837668 136 nips-2006-Multi-Instance Multi-Label Learning with Application to Scene Classification

14 0.37837824 128 nips-2006-Manifold Denoising

15 0.37554279 169 nips-2006-Relational Learning with Gaussian Processes

16 0.34465599 68 nips-2006-Dirichlet-Enhanced Spam Filtering based on Biased Samples

17 0.32699016 200 nips-2006-Unsupervised Regression with Applications to Nonlinear System Identification

18 0.29973602 51 nips-2006-Clustering Under Prior Knowledge with Application to Image Segmentation

19 0.2726225 63 nips-2006-Cross-Validation Optimization for Large Scale Hierarchical Classification Kernel Methods

20 0.26992604 105 nips-2006-Large Margin Component Analysis

similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(1, 0.126), (3, 0.417), (7, 0.09), (9, 0.033), (22, 0.056), (44, 0.038), (57, 0.05), (64, 0.013), (65, 0.034), (69, 0.027), (72, 0.01)]

similar papers list:

simIndex simValue paperId paperTitle

1 0.94983029 200 nips-2006-Unsupervised Regression with Applications to Nonlinear System Identification

Author: Ali Rahimi, Ben Recht

Abstract: We derive a cost functional for estimating the relationship between highdimensional observations and the low-dimensional process that generated them with no input-output examples. Limiting our search to invertible observation functions confers numerous beneﬁts, including a compact representation and no suboptimal local minima. Our approximation algorithms for optimizing this cost functional are fast and give diagnostic bounds on the quality of their solution. Our method can be viewed as a manifold learning algorithm that utilizes a prior on the low-dimensional manifold coordinates. The beneﬁts of taking advantage of such priors in manifold learning and searching for the inverse observation functions in system identiﬁcation are demonstrated empirically by learning to track moving targets from raw measurements in a sensor network setting and in an RFID tracking experiment. 1

2 0.9353804 120 nips-2006-Learning to Traverse Image Manifolds

Author: Piotr DollÄ‚Ä„r, Vincent Rabaud, Serge J. Belongie

Abstract: We present a new algorithm, Locally Smooth Manifold Learning (LSML), that learns a warping function from a point on an manifold to its neighbors. Important characteristics of LSML include the ability to recover the structure of the manifold in sparsely populated regions and beyond the support of the provided data. Applications of our proposed technique include embedding with a natural out-of-sample extension and tasks such as tangent distance estimation, frame rate up-conversion, video compression and motion transfer. 1

3 0.90123582 17 nips-2006-A recipe for optimizing a time-histogram

Author: Hideaki Shimazaki, Shigeru Shinomoto

Abstract: The time-histogram method is a handy tool for capturing the instantaneous rate of spike occurrence. In most of the neurophysiological literature, the bin size that critically determines the goodness of the ﬁt of the time-histogram to the underlying rate has been selected by individual researchers in an unsystematic manner. We propose an objective method for selecting the bin size of a time-histogram from the spike data, so that the time-histogram best approximates the unknown underlying rate. The resolution of the histogram increases, or the optimal bin size decreases, with the number of spike sequences sampled. It is notable that the optimal bin size diverges if only a small number of experimental trials are available from a moderately ﬂuctuating rate process. In this case, any attempt to characterize the underlying spike rate will lead to spurious results. Given a paucity of data, our method can also suggest how many more trials are needed until the set of data can be analyzed with the required resolution. 1

same-paper 4 0.85899556 104 nips-2006-Large-Scale Sparsified Manifold Regularization

Author: Ivor W. Tsang, James T. Kwok

5 0.59051287 127 nips-2006-MLLE: Modified Locally Linear Embedding Using Multiple Weights

Author: Zhenyue Zhang, Jing Wang

Abstract: The locally linear embedding (LLE) is improved by introducing multiple linearly independent local weight vectors for each neighborhood. We characterize the reconstruction weights and show the existence of the linearly independent weight vectors at each neighborhood. The modiﬁed locally linear embedding (MLLE) proposed in this paper is much stable. It can retrieve the ideal embedding if MLLE is applied on data points sampled from an isometric manifold. MLLE is also compared with the local tangent space alignment (LTSA). Numerical examples are given that show the improvement and efﬁciency of MLLE. 1

6 0.58255571 87 nips-2006-Graph Laplacian Regularization for Large-Scale Semidefinite Programming

7 0.5702619 128 nips-2006-Manifold Denoising

8 0.56371218 106 nips-2006-Large Margin Hidden Markov Models for Automatic Speech Recognition

9 0.56254989 184 nips-2006-Stratification Learning: Detecting Mixed Density and Dimensionality in High Dimensional Point Clouds

10 0.559228 121 nips-2006-Learning to be Bayesian without Supervision

11 0.54479259 153 nips-2006-Online Clustering of Moving Hyperplanes

12 0.54173809 48 nips-2006-Branch and Bound for Semi-Supervised Support Vector Machines

13 0.53346407 15 nips-2006-A Switched Gaussian Process for Estimating Disparity and Segmentation in Binocular Stereo

14 0.52915943 95 nips-2006-Implicit Surfaces with Globally Regularised and Compactly Supported Basis Functions

15 0.52619416 107 nips-2006-Large Margin Multi-channel Analog-to-Digital Conversion with Applications to Neural Prosthesis

16 0.52182633 67 nips-2006-Differential Entropic Clustering of Multivariate Gaussians

17 0.5143159 108 nips-2006-Large Scale Hidden Semi-Markov SVMs

18 0.51125389 55 nips-2006-Computation of Similarity Measures for Sequential Data using Generalized Suffix Trees

19 0.50906402 162 nips-2006-Predicting spike times from subthreshold dynamics of a neuron

20 0.50707847 105 nips-2006-Large Margin Component Analysis