jmlr jmlr2013 jmlr2013-69 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Partha Niyogi
Abstract: Manifold regularization (Belkin et al., 2006) is a geometrically motivated framework for machine learning within which several semi-supervised algorithms have been constructed. Here we try to provide some theoretical understanding of this approach. Our main result is to expose the natural structure of a class of problems on which manifold regularization methods are helpful. We show that for such problems, no supervised learner can learn effectively. On the other hand, a manifold based learner (that knows the manifold or “learns” it from unlabeled examples) can learn with relatively few labeled examples. Our analysis follows a minimax style with an emphasis on finite sample results (in terms of n: the number of labeled examples). These results allow us to properly interpret manifold regularization and related spectral and geometric algorithms in terms of their potential use in semi-supervised learning. Keywords: semi-supervised learning, manifold regularization, graph Laplacian, minimax rates
Reference: text
sentIndex sentText sentNum sentScore
1 Our main result is to expose the natural structure of a class of problems on which manifold regularization methods are helpful. [sent-5, score-0.767]
2 On the other hand, a manifold based learner (that knows the manifold or “learns” it from unlabeled examples) can learn with relatively few labeled examples. [sent-7, score-1.555]
3 These results allow us to properly interpret manifold regularization and related spectral and geometric algorithms in terms of their potential use in semi-supervised learning. [sent-9, score-0.702]
4 Keywords: semi-supervised learning, manifold regularization, graph Laplacian, minimax rates 1. [sent-10, score-0.673]
5 Introduction The last decade has seen a flurry of activity within machine learning on two topics that are the subject of this paper: manifold method and semi-supervised learning. [sent-11, score-0.599]
6 While manifold methods are generally applicable to a variety of problems, the framework of manifold regularization (Belkin et al. [sent-12, score-1.301]
7 First, manifold regularization is not a single algorithm but rather a collection of algorithms. [sent-16, score-0.726]
8 Second, while many semi-supervised algorithms have been derived from this perspective and many have enjoyed empirical success, there are few theoretical analyses that characterize the class of problems on which manifold regularization approaches are likely to work. [sent-18, score-0.744]
9 Even when the data might have a manifold structure, it is not clear whether learning the manifold is necessary for good performance. [sent-20, score-1.22]
10 N IYOGI adapted without knowing very much about the manifold in question beyond its dimension. [sent-25, score-0.642]
11 , Lafferty and Wasserman, 2007) to suggest that manifold regularization does not provide any particular advantage. [sent-28, score-0.702]
12 What is particularly missing in the prior research so far is a crisp theoretical statement which shows the benefits of manifold regularization techniques quite clearly. [sent-29, score-0.702]
13 This paper provides such a theoretical analysis, and explicates the nature of manifold regularization in the context of semisupervised learning. [sent-30, score-0.702]
14 This provides for the first time a clear separation between a manifold method and alternatives for a suitably chosen class of problems (problems that have intrinsic manifold structure). [sent-32, score-1.263]
15 To illustrate this conceptual point, we have defined a simple class of problems where the support of the data is simply a one dimensional manifold (the circle) embedded in an ambient Euclidean space. [sent-33, score-0.67]
16 However, it is worth emphasizing that this conceptual point may also obtain in far more general manifold settings. [sent-35, score-0.599]
17 1, we develop the basic minimax framework of analysis that allows us to compare the rates of learning for manifold based semi-supervised learners and fully supervised learners. [sent-41, score-0.666]
18 Following this in Section 2, we demonstrate a separation between the two kinds of learners by proving an upper bound on the manifold based learner and a lower bound on any alternative learner. [sent-42, score-0.725]
19 In Section 3, we take a broader look at manifold learning and regularization in order to expose some subtle issues around these subjects that have not been carefully considered by the machine learning community. [sent-43, score-0.725]
20 We show how both the classical results of Castelli and Cover (1996, one of the earliest known examples of the power of semi-supervised learning) and the recent results of manifold regularization relate to this general structure. [sent-46, score-0.702]
21 This knowledge may be acquired by a manifold learning procedure through unlabeled examples xi ’s and having access to an 1230 M ANIFOLD R EGULARIZATION T HEORY essentially infinite number of them. [sent-56, score-0.751]
22 Every p ∈ P is such that its marginal pX has support on a k-dimensional manifold M ⊂ X. [sent-61, score-0.643]
23 A p∈P This is the best possible rate achieved by any learner that has no knowledge of the manifold M . [sent-86, score-0.725]
24 We will contrast it with a learner that has oracle access endowing it with knowledge of the manifold. [sent-87, score-0.214]
25 A M p∈P M Now a manifold based learner A′ is given a collection of labeled examples z = (z1 , . [sent-89, score-0.796]
26 It might acquire this knowledge through manifold learning or through oracle access (the limit of infinite amounts of unlabeled data). [sent-94, score-0.831]
27 The minimax rate for such a manifold based learner for the class PM is given by inf sup Ez ||A′ (z, M ) − m p ||L2 (pX ) . [sent-96, score-0.842]
28 This is a class of problems for which knowing the manifold confers an advantage to the learner. [sent-100, score-0.684]
29 There are two main assumptions behind the manifold based approach to semi-supervised learning. [sent-101, score-0.599]
30 This assumption and its corresponding motivation has been articulated many times in papers on manifold methods (see Roweis and Saul, 2000, for example). [sent-105, score-0.599]
31 In manifold regularization, a geometric smoothness penalty is instead imposed. [sent-110, score-0.599]
32 , be the eigenfunctions of the manifold Laplacian (ordered by frequency). [sent-114, score-0.752]
33 Against this backdrop, one might now consider manifold regularization to get some better understanding of when and why it might be expected to provide good semi-supervised learning. [sent-116, score-0.746]
34 First off, it is worthwhile to clarify what is meant by manifold regularization. [sent-117, score-0.626]
35 While Equation 1 is regularization in the Tikhonov form, one could consider other kinds of model selection principles that are in the spirit of manifold regularization. [sent-132, score-0.702]
36 For example, the method of Belkin and Niyogi (2003) is a version of the method of sieves that may be interpreted as manifold regularization with bandlimited functions where one allows the bandwidth to grow as more and more data becomes available. [sent-133, score-0.77]
37 The formalism provides a class of algorithms A′ that have access to labeled examples z and the manifold M from which all the terms in the optimization of Equation 1 can be computed. [sent-135, score-0.718]
38 The graph is viewed as a proxy for the manifold and in this sense, many graph based approaches to semi-supervised learning (see Zhu, 2008, for review) may be accommodated within the purview of manifold regularization. [sent-139, score-1.272]
39 The point of these remarks is that manifold regularization combines the perspective of kernel based methods with the perspective of manifold and graph based methods. [sent-140, score-1.364]
40 A Prototypical Example: Embeddings of the Circle into Euclidean Space In this section, we will construct a class of learning problems P that have manifold structure P = ∪M PM and demonstrate a separation between R(n, P ) and Q(n, P ). [sent-144, score-0.641]
41 Thus the learner with knowledge of the manifold learns easily while the learner with no such knowledge cannot learn at all. [sent-147, score-0.897]
42 Now consider the family of such isometric embeddings and let this be the family of one-dimensional submanifolds that we will deal with. [sent-149, score-0.22]
43 1 Upper Bound on Q(n, P ) Let us begin by noting that if the manifold M is known, the learner knows the class PM . [sent-172, score-0.805]
44 Then the learner with knowledge of the manifold converges at a fast rate given by 3 log(n) n and this rate is optimal. [sent-178, score-0.725]
45 Q(n, P ) ≤ 2 p Remark 3 If the class HM is a a parametric family of the form ∑i=1 αi φi where φi are the eigenfunctions of the Laplacian, one obtains the same parametric rate. [sent-180, score-0.249]
46 ) The above in turn is lowerbounded by n ≥∑ l=0 x∈Sl d pn (x)||A(z p (x)) − m p ||L2 (pX ) X where Sl = {x ∈ X n | exactly l segments contain data and links do not }. [sent-220, score-0.199]
47 We now construct a family of intersecting manifolds such that given two points on any manifold in this family, it is difficult to judge (without knowing the manifold) whether these points are near or far in geodesic distance. [sent-231, score-0.761]
48 The class of learning problems consists of probability distributions p such that pX is supported on some manifold in this class. [sent-232, score-0.685]
49 Consider a set of 2d manifolds where each manifold has a structure shown in Figure 2. [sent-235, score-0.664]
50 Each manifold has three disjoint subsets: A (loops), B (links), and C (chain) such that M = A ∪ B ∪C. [sent-236, score-0.599]
51 The links connect the loops to the segments as shown in Figure 2 so that one obtains a closed curve corresponding to an embedding of the circle into RD . [sent-242, score-0.294]
52 , d} one constructs a manifold (we can denote this by MS ) such that the links connect Ci (for i ∈ S) to the “upper half” of the loop and they connect C j (for j ∈ {1, . [sent-246, score-0.649]
53 For manifold MS , let l(A) = l(MS ) (S) A pX (x)dx = αS (S) where pX is the probability density function on the manifold MS . [sent-251, score-1.22]
54 Thus for each manifold MS , we have the associated class of probability distributions PMS . [sent-254, score-0.685]
55 Now for each such manifold MS , we pick one probability distribution p(S) ∈ PMS such that for every k ∈ S, we have For all k ∈ S, p(S) (y = +1|x) = 1 for all x ∈ Ck 1238 M ANIFOLD R EGULARIZATION T HEORY and for every k ∈ {1, . [sent-256, score-0.621]
56 We now prove the following technical lemma that proves an inequality that holds when the data only lives on the segments and not on the links that constitute the embedded circle of Construction 1. [sent-267, score-0.258]
57 Thus all the elements of PD agree in their labelling of the segments containing data but disagree in their labelling of segments not containing data. [sent-281, score-0.279]
58 For any p ∈ P and any segment ck , we say that p “disagrees” with f on ck if | f (x)m p (x)| ≥ 1 on a majority of ck , that is, A pX (x) ≥ ck \A pX (x) where A = {x ∈ ck || f (x)m p (x)| ≥ 1}. [sent-288, score-0.285]
59 Therefore, if f and p disagree on ck , we have ck ( f (x) − m p (x))2 pX (x) ≥ 1 2 1239 ck pX (x) ≥ 1 (1 − α − β). [sent-289, score-0.206]
60 3 Discussion Thus we see that knowledge of the manifold can have profound consequences for learning. [sent-295, score-0.599]
61 The proof of the lower bound reflects the intuition that has always been at the root of manifold based methods for semi-supervised learning. [sent-296, score-0.599]
62 We must further have the prior knowledge that the target function varies smoothly along the manifold and so “closeness on the manifold” translates to similarity in function values (or label probabilities). [sent-299, score-0.599]
63 This makes the task of the learner who does not know the manifold difficult: in fact impossible in the sense described in Theorem 4. [sent-301, score-0.747]
64 Thus we may appreciate the more general circumstances under which we might see a separation between manifold methods and alternative methods. [sent-304, score-0.621]
65 Thus if M is taken to be a k-dimensional submanifold of RN , then one could let M be a family of k-dimensional submanifolds of RN and let P be the naturally associated family of probability distributions that define a collection of learning problems. [sent-307, score-0.284]
66 For one, thresholding is not necessary, and if the class HM was simply defined as p bandlimited functions, that is, consisting of functions of the form ∑i=1 αi φi (where φi are the eigenfunctions of the Laplacian of M ), the result of Theorem 4 holds as well. [sent-312, score-0.284]
67 Both upper and lower bounds are valid also for arbitrary marginal distributions pX (not just uniform) that have support on some manifold M . [sent-320, score-0.665]
68 1 Knowing the Manifold and Learning It In the discussion so far, we have implicitly assumed that an oracle can provide perfect information about the manifold in whatever form we choose. [sent-326, score-0.657]
69 Yet, the whole issue of knowing the manifold is considerably more subtle than appears at first blush and in fact has never been carefully considered by the machine learning community. [sent-328, score-0.642]
70 For example, consider the following oracles that all provide knowledge of the manifold but in different forms. [sent-329, score-0.622]
71 One could know the manifold up to some geometric or topological invariants. [sent-339, score-0.621]
72 Depending upon the kind of oracle access we have, the task of the learner might vary from simple to impossible. [sent-346, score-0.236]
73 Then one may define the point cloud Laplace operator Ltm as follows: Ltm f (x) = ||x−xi ||2 1 1 m 1 ( f (x) − f (xi ))e− 4t ∑ t (4πt)d/2 m i=1 The point cloud Laplacian is a random operator that is the natural extension of the graph Laplacian operator to the whole space. [sent-365, score-0.298]
74 The quantity t (similar to a bandwidth) needs to go to zero at a suitable rate (tmd+2 → ∞) so there exists a sequence tm such that the point cloud Laplacian converges to the manifold Laplacian as m → ∞. [sent-374, score-0.712]
75 In other words, the eigenvalues and eigenfunctions of the point cloud Laplacian converge to those of the manifold Laplacian as the number of data points m go to infinity. [sent-382, score-0.839]
76 These results enable us to present a semi-supervised algorithm that learns the manifold from unlabeled data and uses this knowledge to realize the upper bound of Theorem 2. [sent-383, score-0.743]
77 Let us consider the following kind of manifold regularization based semi-supervised learner. [sent-399, score-0.702]
78 Construct the point cloud Laplacian operator Ltm from the unlabeled data x. [sent-401, score-0.238]
79 Thus we may compare the two manifold regularization algorithms (an empirical one with unlabeled data and an oracle one that knows the manifold): ˆ ˆ A(z, x) = sign( fˆm ) = sign(αm φm + βm ψm ) and ˆ ˆ Aoracle (z, M ) = sign( fˆ) = sign(αφ + βψ). [sent-411, score-0.92]
80 Further, if f = αφ + βψ (α2 + β2 = 1) and g = αφm + βψm where φ, ψ are eigenfunctions of the Laplacian on M while φm , ψm are eigenfunctions of point cloud Laplacian as defined in the previous developments. [sent-423, score-0.393]
81 The corollary is Corollary 9 Let P = ∪M PM be a collection of learning problems with the structure described in Section 2, that is, each p ∈ P is such that the marginal pX has support on a submanifold M of RD which corresponds to a particular isometric embedding of the circle into Euclidean space. [sent-436, score-0.298]
82 Yet the semi-supervised manifold regularization algorithm described above (with infinite amount of unlabeled data) converges at a fast rate as a function of labelled data. [sent-439, score-0.824]
83 But putting such constraints we can construct classes of learning problems where for any realistic number of labeled examples n, there is a gap between the performance of a supervised learner and the manifold based semi-supervised learner. [sent-444, score-0.802]
84 For an upper bound Q(n), we follow the the manifold regularization algorithm of the previous section and note that eigenfunctions of the Laplacian can be estimated for compact manifolds with a curvature bound. [sent-458, score-0.92]
85 The Structure of Semi-supervised Learning It is worthwhile to reflect on why the manifold regularization algorithm is able to display improved performance in semi-supervised learning. [sent-463, score-0.702]
86 The manifold assumption is a device that allows us to link the marginal pX with the conditional p(y|x). [sent-464, score-0.666]
87 Through unlabeled data x, we can learn the manifold M thereby greatly reducing the class of possible conditionals p(y|x) that we need to consider. [sent-465, score-0.835]
88 ” For a situation like this, knowing the marginal tells us a lot about the conditional and therefore unlabeled data can be useful. [sent-470, score-0.209]
89 q(x) In this setting, unlabeled data allows the learner to estimate the marginal q. [sent-478, score-0.292]
90 2 Manifold Regularization Interpreted In its most general form, manifold regularization encompasses a class of geometrically motivated approaches to learning. [sent-486, score-0.786]
91 In problems that have this general structure, we expect manifold regularization and related algorithms (that use the graph Laplacian or a suitable spectral approximation) to work well. [sent-511, score-0.765]
92 Conclusions We have considered a minimax style framework within which we have investigated the potential role of manifold learning in learning from labeled and unlabeled examples. [sent-515, score-0.805]
93 We demonstrated the natural structure of a class of problems on which knowing the manifold makes a big difference. [sent-516, score-0.684]
94 On such problems, we see that manifold regularization is provably better than any supervised learning algorithm. [sent-517, score-0.732]
95 Our proof clarifies a potential source of confusion in the literature on manifold learning. [sent-518, score-0.599]
96 We see that if data lives on an underlying manifold but this manifold is unknown and belongs to a class 1248 M ANIFOLD R EGULARIZATION T HEORY of possible smooth manifolds, it is possible that supervised learning (classification and regression problems) may be ineffective, even impossible. [sent-519, score-1.333]
97 In contrast, if the manifold is fixed though unknown, it may be possible to (e. [sent-520, score-0.599]
98 In between these two cases lie various situations that need to be properly explored for a greater understanding of the potential benefits and limitations of manifold methods and the need for manifold learning. [sent-523, score-1.198]
99 Our analysis allows us to see the role of manifold regularization in semi-supervised learning in a clear way. [sent-524, score-0.702]
100 Several algorithms using manifold and associated graph-based methods have seen some empirical success recently. [sent-525, score-0.599]
wordName wordTfidf (topN-words)
[('manifold', 0.599), ('px', 0.384), ('ez', 0.209), ('hm', 0.184), ('laplacian', 0.16), ('eigenfunctions', 0.153), ('iyogi', 0.149), ('sl', 0.126), ('learner', 0.126), ('pm', 0.122), ('unlabeled', 0.122), ('anifold', 0.116), ('sign', 0.106), ('regularization', 0.103), ('heory', 0.096), ('egularization', 0.096), ('ltm', 0.095), ('partha', 0.095), ('belkin', 0.094), ('sobolev', 0.09), ('segments', 0.088), ('cloud', 0.087), ('pd', 0.081), ('castelli', 0.081), ('circle', 0.081), ('submanifold', 0.077), ('ms', 0.073), ('bandlimited', 0.068), ('manifolds', 0.065), ('pn', 0.061), ('oracle', 0.058), ('ck', 0.057), ('family', 0.054), ('links', 0.05), ('conditionals', 0.048), ('labeled', 0.047), ('mikhail', 0.046), ('niyogi', 0.046), ('heat', 0.045), ('marginal', 0.044), ('gm', 0.044), ('knowing', 0.043), ('embeddings', 0.042), ('loops', 0.042), ('geometrically', 0.042), ('class', 0.042), ('arcsin', 0.041), ('isometric', 0.039), ('lives', 0.039), ('knows', 0.038), ('sup', 0.038), ('graph', 0.037), ('minimax', 0.037), ('disagree', 0.035), ('embedding', 0.033), ('cd', 0.033), ('gin', 0.031), ('submanifolds', 0.031), ('spaces', 0.031), ('risk', 0.03), ('supervised', 0.03), ('construction', 0.03), ('access', 0.03), ('zi', 0.03), ('ambient', 0.029), ('operator', 0.029), ('cover', 0.028), ('theorems', 0.028), ('pi', 0.028), ('clarify', 0.027), ('backdrop', 0.027), ('div', 0.027), ('grigoryan', 0.027), ('pms', 0.027), ('zn', 0.027), ('ci', 0.027), ('remarks', 0.026), ('suitable', 0.026), ('noiseless', 0.025), ('si', 0.024), ('wasserman', 0.024), ('correspondingly', 0.024), ('fv', 0.024), ('learn', 0.024), ('collection', 0.024), ('belongs', 0.024), ('oracles', 0.023), ('expose', 0.023), ('labelling', 0.023), ('suitably', 0.023), ('link', 0.023), ('agree', 0.022), ('know', 0.022), ('distributions', 0.022), ('might', 0.022), ('probability', 0.022), ('learns', 0.022), ('thresholding', 0.021), ('concentrated', 0.021), ('diffusion', 0.021)]
simIndex simValue paperId paperTitle
same-paper 1 1.0000006 69 jmlr-2013-Manifold Regularization and Semi-supervised Learning: Some Theoretical Analyses
Author: Partha Niyogi
Abstract: Manifold regularization (Belkin et al., 2006) is a geometrically motivated framework for machine learning within which several semi-supervised algorithms have been constructed. Here we try to provide some theoretical understanding of this approach. Our main result is to expose the natural structure of a class of problems on which manifold regularization methods are helpful. We show that for such problems, no supervised learner can learn effectively. On the other hand, a manifold based learner (that knows the manifold or “learns” it from unlabeled examples) can learn with relatively few labeled examples. Our analysis follows a minimax style with an emphasis on finite sample results (in terms of n: the number of labeled examples). These results allow us to properly interpret manifold regularization and related spectral and geometric algorithms in terms of their potential use in semi-supervised learning. Keywords: semi-supervised learning, manifold regularization, graph Laplacian, minimax rates
2 0.21497381 86 jmlr-2013-Parallel Vector Field Embedding
Author: Binbin Lin, Xiaofei He, Chiyuan Zhang, Ming Ji
Abstract: We propose a novel local isometry based dimensionality reduction method from the perspective of vector fields, which is called parallel vector field embedding (PFE). We first give a discussion on local isometry and global isometry to show the intrinsic connection between parallel vector fields and isometry. The problem of finding an isometry turns out to be equivalent to finding orthonormal parallel vector fields on the data manifold. Therefore, we first find orthonormal parallel vector fields by solving a variational problem on the manifold. Then each embedding function can be obtained by requiring its gradient field to be as close to the corresponding parallel vector field as possible. Theoretical results show that our method can precisely recover the manifold if it is isometric to a connected open subset of Euclidean space. Both synthetic and real data examples demonstrate the effectiveness of our method even if there is heavy noise and high curvature. Keywords: manifold learning, isometry, vector field, covariant derivative, out-of-sample extension
3 0.20445462 34 jmlr-2013-Distance Preserving Embeddings for General n-Dimensional Manifolds
Author: Nakul Verma
Abstract: Low dimensional embeddings of manifold data have gained popularity in the last decade. However, a systematic finite sample analysis of manifold embedding algorithms largely eludes researchers. Here we present two algorithms that embed a general n-dimensional manifold into Rd (where d only depends on some key manifold properties such as its intrinsic dimension, volume and curvature) that guarantee to approximately preserve all interpoint geodesic distances. Keywords: manifold learning, isometric embeddings, non-linear dimensionality reduction, Nash’s embedding theorem
4 0.1747652 96 jmlr-2013-Regularization-Free Principal Curve Estimation
Author: Samuel Gerber, Ross Whitaker
Abstract: Principal curves and manifolds provide a framework to formulate manifold learning within a statistical context. Principal curves define the notion of a curve passing through the middle of a distribution. While the intuition is clear, the formal definition leads to some technical and practical difficulties. In particular, principal curves are saddle points of the mean-squared projection distance, which poses severe challenges for estimation and model selection. This paper demonstrates that the difficulties in model selection associated with the saddle point property of principal curves are intrinsically tied to the minimization of the mean-squared projection distance. We introduce a new objective function, facilitated through a modification of the principal curve estimation approach, for which all critical points are principal curves and minima. Thus, the new formulation removes the fundamental issue for model selection in principal curve estimation. A gradient-descent-based estimator demonstrates the effectiveness of the new formulation for controlling model complexity on numerical experiments with synthetic and real data. Keywords: principal curve, manifold estimation, unsupervised learning, model complexity, model selection
5 0.14175171 59 jmlr-2013-Large-scale SVD and Manifold Learning
Author: Ameet Talwalkar, Sanjiv Kumar, Mehryar Mohri, Henry Rowley
Abstract: This paper examines the efficacy of sampling-based low-rank approximation techniques when applied to large dense kernel matrices. We analyze two common approximate singular value decomposition techniques, namely the Nystr¨ m and Column sampling methods. We present a theoretical o comparison between these two methods, provide novel insights regarding their suitability for various tasks and present experimental results that support our theory. Our results illustrate the relative strengths of each method. We next examine the performance of these two techniques on the largescale task of extracting low-dimensional manifold structure given millions of high-dimensional face images. We address the computational challenges of non-linear dimensionality reduction via Isomap and Laplacian Eigenmaps, using a graph containing about 18 million nodes and 65 million edges. We present extensive experiments on learning low-dimensional embeddings for two large face data sets: CMU-PIE (35 thousand faces) and a web data set (18 million faces). Our comparisons show that the Nystr¨ m approximation is superior to the Column sampling method for this o task. Furthermore, approximate Isomap tends to perform better than Laplacian Eigenmaps on both clustering and classification with the labeled CMU-PIE data set. Keywords: low-rank approximation, manifold learning, large-scale matrix factorization
6 0.087697975 77 jmlr-2013-On the Convergence of Maximum Variance Unfolding
7 0.081797495 99 jmlr-2013-Semi-Supervised Learning Using Greedy Max-Cut
8 0.081597671 117 jmlr-2013-Universal Consistency of Localized Versions of Regularized Kernel Methods
9 0.065659843 39 jmlr-2013-Efficient Active Learning of Halfspaces: An Aggressive Approach
10 0.060401596 10 jmlr-2013-Algorithms and Hardness Results for Parallel Large Margin Learning
11 0.05966996 14 jmlr-2013-Asymptotic Results on Adaptive False Discovery Rate Controlling Procedures Based on Kernel Estimators
12 0.056785401 76 jmlr-2013-Nonparametric Sparsity and Regularization
13 0.056477915 8 jmlr-2013-A Theory of Multiclass Boosting
14 0.048604209 35 jmlr-2013-Distribution-Dependent Sample Complexity of Large Margin Learning
15 0.047660094 70 jmlr-2013-Maximum Volume Clustering: A New Discriminative Clustering Approach
16 0.040820487 104 jmlr-2013-Sparse Single-Index Model
17 0.03937567 32 jmlr-2013-Differential Privacy for Functions and Functional Data
18 0.039218798 6 jmlr-2013-A Plug-in Approach to Neyman-Pearson Classification
19 0.038944487 4 jmlr-2013-A Max-Norm Constrained Minimization Approach to 1-Bit Matrix Completion
20 0.038678508 26 jmlr-2013-Conjugate Relation between Loss Functions and Uncertainty Sets in Classification Problems
topicId topicWeight
[(0, -0.245), (1, 0.207), (2, 0.08), (3, -0.401), (4, -0.145), (5, 0.037), (6, 0.025), (7, -0.053), (8, -0.048), (9, -0.064), (10, -0.196), (11, 0.078), (12, -0.059), (13, -0.05), (14, 0.027), (15, 0.012), (16, 0.033), (17, 0.097), (18, -0.008), (19, 0.026), (20, -0.158), (21, -0.074), (22, 0.046), (23, 0.065), (24, 0.026), (25, -0.025), (26, 0.071), (27, 0.015), (28, 0.085), (29, 0.011), (30, 0.02), (31, 0.09), (32, -0.091), (33, -0.044), (34, 0.017), (35, -0.044), (36, 0.004), (37, 0.036), (38, -0.12), (39, -0.05), (40, -0.078), (41, -0.041), (42, 0.075), (43, -0.03), (44, -0.043), (45, 0.042), (46, 0.04), (47, 0.002), (48, -0.055), (49, 0.03)]
simIndex simValue paperId paperTitle
same-paper 1 0.96653199 69 jmlr-2013-Manifold Regularization and Semi-supervised Learning: Some Theoretical Analyses
Author: Partha Niyogi
Abstract: Manifold regularization (Belkin et al., 2006) is a geometrically motivated framework for machine learning within which several semi-supervised algorithms have been constructed. Here we try to provide some theoretical understanding of this approach. Our main result is to expose the natural structure of a class of problems on which manifold regularization methods are helpful. We show that for such problems, no supervised learner can learn effectively. On the other hand, a manifold based learner (that knows the manifold or “learns” it from unlabeled examples) can learn with relatively few labeled examples. Our analysis follows a minimax style with an emphasis on finite sample results (in terms of n: the number of labeled examples). These results allow us to properly interpret manifold regularization and related spectral and geometric algorithms in terms of their potential use in semi-supervised learning. Keywords: semi-supervised learning, manifold regularization, graph Laplacian, minimax rates
2 0.75467527 34 jmlr-2013-Distance Preserving Embeddings for General n-Dimensional Manifolds
Author: Nakul Verma
Abstract: Low dimensional embeddings of manifold data have gained popularity in the last decade. However, a systematic finite sample analysis of manifold embedding algorithms largely eludes researchers. Here we present two algorithms that embed a general n-dimensional manifold into Rd (where d only depends on some key manifold properties such as its intrinsic dimension, volume and curvature) that guarantee to approximately preserve all interpoint geodesic distances. Keywords: manifold learning, isometric embeddings, non-linear dimensionality reduction, Nash’s embedding theorem
3 0.71797085 86 jmlr-2013-Parallel Vector Field Embedding
Author: Binbin Lin, Xiaofei He, Chiyuan Zhang, Ming Ji
Abstract: We propose a novel local isometry based dimensionality reduction method from the perspective of vector fields, which is called parallel vector field embedding (PFE). We first give a discussion on local isometry and global isometry to show the intrinsic connection between parallel vector fields and isometry. The problem of finding an isometry turns out to be equivalent to finding orthonormal parallel vector fields on the data manifold. Therefore, we first find orthonormal parallel vector fields by solving a variational problem on the manifold. Then each embedding function can be obtained by requiring its gradient field to be as close to the corresponding parallel vector field as possible. Theoretical results show that our method can precisely recover the manifold if it is isometric to a connected open subset of Euclidean space. Both synthetic and real data examples demonstrate the effectiveness of our method even if there is heavy noise and high curvature. Keywords: manifold learning, isometry, vector field, covariant derivative, out-of-sample extension
4 0.64204705 96 jmlr-2013-Regularization-Free Principal Curve Estimation
Author: Samuel Gerber, Ross Whitaker
Abstract: Principal curves and manifolds provide a framework to formulate manifold learning within a statistical context. Principal curves define the notion of a curve passing through the middle of a distribution. While the intuition is clear, the formal definition leads to some technical and practical difficulties. In particular, principal curves are saddle points of the mean-squared projection distance, which poses severe challenges for estimation and model selection. This paper demonstrates that the difficulties in model selection associated with the saddle point property of principal curves are intrinsically tied to the minimization of the mean-squared projection distance. We introduce a new objective function, facilitated through a modification of the principal curve estimation approach, for which all critical points are principal curves and minima. Thus, the new formulation removes the fundamental issue for model selection in principal curve estimation. A gradient-descent-based estimator demonstrates the effectiveness of the new formulation for controlling model complexity on numerical experiments with synthetic and real data. Keywords: principal curve, manifold estimation, unsupervised learning, model complexity, model selection
5 0.38624457 59 jmlr-2013-Large-scale SVD and Manifold Learning
Author: Ameet Talwalkar, Sanjiv Kumar, Mehryar Mohri, Henry Rowley
Abstract: This paper examines the efficacy of sampling-based low-rank approximation techniques when applied to large dense kernel matrices. We analyze two common approximate singular value decomposition techniques, namely the Nystr¨ m and Column sampling methods. We present a theoretical o comparison between these two methods, provide novel insights regarding their suitability for various tasks and present experimental results that support our theory. Our results illustrate the relative strengths of each method. We next examine the performance of these two techniques on the largescale task of extracting low-dimensional manifold structure given millions of high-dimensional face images. We address the computational challenges of non-linear dimensionality reduction via Isomap and Laplacian Eigenmaps, using a graph containing about 18 million nodes and 65 million edges. We present extensive experiments on learning low-dimensional embeddings for two large face data sets: CMU-PIE (35 thousand faces) and a web data set (18 million faces). Our comparisons show that the Nystr¨ m approximation is superior to the Column sampling method for this o task. Furthermore, approximate Isomap tends to perform better than Laplacian Eigenmaps on both clustering and classification with the labeled CMU-PIE data set. Keywords: low-rank approximation, manifold learning, large-scale matrix factorization
6 0.35044479 77 jmlr-2013-On the Convergence of Maximum Variance Unfolding
7 0.34899622 39 jmlr-2013-Efficient Active Learning of Halfspaces: An Aggressive Approach
8 0.33156776 117 jmlr-2013-Universal Consistency of Localized Versions of Regularized Kernel Methods
9 0.32293323 99 jmlr-2013-Semi-Supervised Learning Using Greedy Max-Cut
10 0.29496688 76 jmlr-2013-Nonparametric Sparsity and Regularization
11 0.27590489 14 jmlr-2013-Asymptotic Results on Adaptive False Discovery Rate Controlling Procedures Based on Kernel Estimators
13 0.23325579 6 jmlr-2013-A Plug-in Approach to Neyman-Pearson Classification
14 0.21825351 10 jmlr-2013-Algorithms and Hardness Results for Parallel Large Margin Learning
15 0.20834099 18 jmlr-2013-Beyond Fano's Inequality: Bounds on the Optimal F-Score, BER, and Cost-Sensitive Risk and Their Implications
16 0.2010396 38 jmlr-2013-Dynamic Affine-Invariant Shape-Appearance Handshape Features and Classification in Sign Language Videos
17 0.19694461 55 jmlr-2013-Joint Harmonic Functions and Their Supervised Connections
18 0.19448705 35 jmlr-2013-Distribution-Dependent Sample Complexity of Large Margin Learning
19 0.19365354 101 jmlr-2013-Sparse Activity and Sparse Connectivity in Supervised Learning
20 0.1926655 70 jmlr-2013-Maximum Volume Clustering: A New Discriminative Clustering Approach
topicId topicWeight
[(0, 0.019), (5, 0.125), (6, 0.031), (10, 0.14), (20, 0.01), (23, 0.038), (53, 0.382), (68, 0.023), (70, 0.021), (75, 0.05), (85, 0.032), (87, 0.021)]
simIndex simValue paperId paperTitle
1 0.79440105 108 jmlr-2013-Stochastic Variational Inference
Author: Matthew D. Hoffman, David M. Blei, Chong Wang, John Paisley
Abstract: We develop stochastic variational inference, a scalable algorithm for approximating posterior distributions. We develop this technique for a large class of probabilistic models and we demonstrate it with two probabilistic topic models, latent Dirichlet allocation and the hierarchical Dirichlet process topic model. Using stochastic variational inference, we analyze several large collections of documents: 300K articles from Nature, 1.8M articles from The New York Times, and 3.8M articles from Wikipedia. Stochastic inference can easily handle data sets of this size and outperforms traditional variational inference, which can only handle a smaller subset. (We also show that the Bayesian nonparametric topic model outperforms its parametric counterpart.) Stochastic variational inference lets us apply complex Bayesian models to massive data sets. Keywords: Bayesian inference, variational inference, stochastic optimization, topic models, Bayesian nonparametrics
same-paper 2 0.77138895 69 jmlr-2013-Manifold Regularization and Semi-supervised Learning: Some Theoretical Analyses
Author: Partha Niyogi
Abstract: Manifold regularization (Belkin et al., 2006) is a geometrically motivated framework for machine learning within which several semi-supervised algorithms have been constructed. Here we try to provide some theoretical understanding of this approach. Our main result is to expose the natural structure of a class of problems on which manifold regularization methods are helpful. We show that for such problems, no supervised learner can learn effectively. On the other hand, a manifold based learner (that knows the manifold or “learns” it from unlabeled examples) can learn with relatively few labeled examples. Our analysis follows a minimax style with an emphasis on finite sample results (in terms of n: the number of labeled examples). These results allow us to properly interpret manifold regularization and related spectral and geometric algorithms in terms of their potential use in semi-supervised learning. Keywords: semi-supervised learning, manifold regularization, graph Laplacian, minimax rates
3 0.49885219 121 jmlr-2013-Variational Inference in Nonconjugate Models
Author: Chong Wang, David M. Blei
Abstract: Mean-field variational methods are widely used for approximate posterior inference in many probabilistic models. In a typical application, mean-field methods approximately compute the posterior with a coordinate-ascent optimization algorithm. When the model is conditionally conjugate, the coordinate updates are easily derived and in closed form. However, many models of interest—like the correlated topic model and Bayesian logistic regression—are nonconjugate. In these models, mean-field methods cannot be directly applied and practitioners have had to develop variational algorithms on a case-by-case basis. In this paper, we develop two generic methods for nonconjugate models, Laplace variational inference and delta method variational inference. Our methods have several advantages: they allow for easily derived variational algorithms with a wide class of nonconjugate models; they extend and unify some of the existing algorithms that have been derived for specific models; and they work well on real-world data sets. We studied our methods on the correlated topic model, Bayesian logistic regression, and hierarchical Bayesian logistic regression. Keywords: variational inference, nonconjugate models, Laplace approximations, the multivariate delta method
4 0.49579582 77 jmlr-2013-On the Convergence of Maximum Variance Unfolding
Author: Ery Arias-Castro, Bruno Pelletier
Abstract: Maximum Variance Unfolding is one of the main methods for (nonlinear) dimensionality reduction. We study its large sample limit, providing specific rates of convergence under standard assumptions. We find that it is consistent when the underlying submanifold is isometric to a convex subset, and we provide some simple examples where it fails to be consistent. Keywords: maximum variance unfolding, isometric embedding, U-processes, empirical processes, proximity graphs.
5 0.47230083 86 jmlr-2013-Parallel Vector Field Embedding
Author: Binbin Lin, Xiaofei He, Chiyuan Zhang, Ming Ji
Abstract: We propose a novel local isometry based dimensionality reduction method from the perspective of vector fields, which is called parallel vector field embedding (PFE). We first give a discussion on local isometry and global isometry to show the intrinsic connection between parallel vector fields and isometry. The problem of finding an isometry turns out to be equivalent to finding orthonormal parallel vector fields on the data manifold. Therefore, we first find orthonormal parallel vector fields by solving a variational problem on the manifold. Then each embedding function can be obtained by requiring its gradient field to be as close to the corresponding parallel vector field as possible. Theoretical results show that our method can precisely recover the manifold if it is isometric to a connected open subset of Euclidean space. Both synthetic and real data examples demonstrate the effectiveness of our method even if there is heavy noise and high curvature. Keywords: manifold learning, isometry, vector field, covariant derivative, out-of-sample extension
6 0.46868098 28 jmlr-2013-Construction of Approximation Spaces for Reinforcement Learning
7 0.46707302 16 jmlr-2013-Bayesian Nonparametric Hidden Semi-Markov Models
8 0.45592338 70 jmlr-2013-Maximum Volume Clustering: A New Discriminative Clustering Approach
9 0.45246035 117 jmlr-2013-Universal Consistency of Localized Versions of Regularized Kernel Methods
10 0.44399816 4 jmlr-2013-A Max-Norm Constrained Minimization Approach to 1-Bit Matrix Completion
11 0.44220585 33 jmlr-2013-Dimension Independent Similarity Computation
12 0.44203356 34 jmlr-2013-Distance Preserving Embeddings for General n-Dimensional Manifolds
13 0.43728858 32 jmlr-2013-Differential Privacy for Functions and Functional Data
14 0.43650943 93 jmlr-2013-Random Walk Kernels and Learning Curves for Gaussian Process Regression on Random Graphs
17 0.4321548 50 jmlr-2013-Greedy Feature Selection for Subspace Clustering
18 0.43207151 59 jmlr-2013-Large-scale SVD and Manifold Learning
19 0.42993587 96 jmlr-2013-Regularization-Free Principal Curve Estimation
20 0.42813334 25 jmlr-2013-Communication-Efficient Algorithms for Statistical Optimization