jmlr jmlr2013 jmlr2013-34 knowledge-graph by maker-knowledge-mining

34 jmlr-2013-Distance Preserving Embeddings for General n-Dimensional Manifolds


Source: pdf

Author: Nakul Verma

Abstract: Low dimensional embeddings of manifold data have gained popularity in the last decade. However, a systematic finite sample analysis of manifold embedding algorithms largely eludes researchers. Here we present two algorithms that embed a general n-dimensional manifold into Rd (where d only depends on some key manifold properties such as its intrinsic dimension, volume and curvature) that guarantee to approximately preserve all interpoint geodesic distances. Keywords: manifold learning, isometric embeddings, non-linear dimensionality reduction, Nash’s embedding theorem

Reference: text


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 However, a systematic finite sample analysis of manifold embedding algorithms largely eludes researchers. [sent-4, score-0.5]

2 Here we present two algorithms that embed a general n-dimensional manifold into Rd (where d only depends on some key manifold properties such as its intrinsic dimension, volume and curvature) that guarantee to approximately preserve all interpoint geodesic distances. [sent-5, score-0.73]

3 Keywords: manifold learning, isometric embeddings, non-linear dimensionality reduction, Nash’s embedding theorem 1. [sent-6, score-0.569]

4 One typically assumes that points are sampled from an n-dimensional manifold residing in some high-dimensional ambient space RD and analyzes to what extent their low dimensional embedding maintains some important manifold property, say, interpoint geodesic distances. [sent-8, score-0.972]

5 Despite an abundance of manifold embedding algorithms, only a few provide any kind of distance preserving guarantee. [sent-9, score-0.536]

6 Unfortunately any kind of systematic finite sample analysis of manifold embedding algorithms— especially for general classes of manifolds—still largely eludes the manifold learning community. [sent-14, score-0.764]

7 If these manifolds reside in some high dimensional ambient space, we would at least like to embed them in a lower dimensional space (possibly slightly larger than n) while still preserving interpoint geodesic distances. [sent-17, score-0.321]

8 Given a sample X from an underlying n-dimensional manifold M ⊂ RD , and an embedding procedure A : M → Rd that (uses X in training and) maps points from M into some low dimensional space Rd , we define the quality of the embedding A as (1 ± ε)-isometric if for all c 2013 Nakul Verma. [sent-19, score-0.772]

9 (2008) provide a nice characterization of manifold curvature via a notion of manifold condition number that will be useful throughout the text (details later). [sent-26, score-0.562]

10 Perhaps the first algorithmic result for embedding a general n-dimensional manifold is due to Baraniuk and Wakin (2009). [sent-27, score-0.5]

11 They show that an orthogonal linear projection of a well-conditioned n-dimensional manifold M ⊂ RD into a sufficiently high dimensional random subspace is enough to approximately preserve all pairwise geodesic distances. [sent-28, score-0.468]

12 There is no requirement that the underlying manifold is connected, or is globally isometric (or even globally diffeomorphic) to some subset of Rn as is frequently assumed by several manifold embedding algorithms. [sent-44, score-0.833]

13 In addition, unlike spectrum-based embedding algorithms in the literature, our algorithms yield an explicit embedding that cleanly embeds out-of-sample data points, and provide isometry guarantees over the entire manifold (not just the input samples). [sent-45, score-0.779]

14 It is worth noting that the techniques used in our proof are different from what Nash uses in his work; unlike traditional differential-geometric settings, we can only access the underlying manifold through a finite size sample. [sent-47, score-0.323]

15 Isometrically Embedding n-Dimensional Manifolds: Intuition Given an underlying n-dimensional manifold M ⊂ RD , we shall use ideas from Nash’s embedding (Nash, 1954) to develop our algorithms. [sent-55, score-0.533]

16 2 Correction Stage Since the random projection can contract different parts of the manifold by different amounts, we will apply several corrections—each corresponding to a different local region—to stretch-out and restore the local distances. [sent-66, score-0.368]

17 Note that such a spiral map stretches the length of the tangent vectors by √ √ a factor of 1 +C2 , since Ψ′ = dΨ/dt = (1,C cos(Ct), −C sin(Ct)) = 1 +C2 . [sent-76, score-0.518]

18 Middle: A low dimensional mapping of the original manifold via, say, a linear projection onto the vertical plane. [sent-81, score-0.393]

19 Different parts of the manifold are contracted by different amounts— distances at the tail-ends are contracted more than the distances in the middle. [sent-82, score-0.526]

20 t−x <ρ} ·e Applying different Ψ’s at different parts of the manifold has an aggregate effect of creating an approximate isometric embedding. [sent-108, score-0.333]

21 This embeds the manifold in low dimensions but distorts the interpoint geodesic distances. [sent-112, score-0.432]

22 Note that the normals (dotted lines) of a particular length incident at each point of the manifold (solid line) will intersect if the manifold is too curvy. [sent-116, score-0.654]

23 We will conclude that restoring the lengths in all neighborhoods yields a globally consistent approximately isometric embedding of M. [sent-119, score-0.342]

24 Since a local region of an n-dimensional manifold can potentially have up to O(2cn ) overlapping regions, we shall require O(2cn ) additional ˜ coordinates to apply the corrections, making the final embedding dimension of O(2cn ) (where c is an absolute constant). [sent-124, score-0.533]

25 Since locally the manifold spreads across its tangent space, these normals indicate the locally empty regions in the embedded space. [sent-127, score-0.686]

26 Notice that long non-intersecting normals are possible only if the manifold is relatively flat. [sent-144, score-0.364]

27 Since we make use of a random projection for the Embedding Stage, it is essential to have good manifold covers. [sent-152, score-0.333]

28 We will use the notation DG (p, q) to indicate the geodesic distance between points p and q where the underlying manifold is understood from the context, and p−q to indicate the Euclidean distance between points p and q where the ambient space is understood from the context. [sent-158, score-0.41]

29 Then, ˆp , we have v · v ≥ 1 − δ, where v is the projection of v onto the ˆ v for any unit vector v in T ˆ ˆ tangent space of M at p. [sent-168, score-0.398]

30 (tangent space approximation criterion) The above is an intuitive notion of manifold sampling that can estimate the local tangent spaces. [sent-169, score-0.559]

31 Now, since the length of any given curve γ : [a, b] → M is given by ab γ′ (s) ds, it is vital to study how our embeddings modify the length of the tangent vectors at any point p ∈ M. [sent-302, score-0.443]

32 In order to discuss tangent vectors, we need to introduce the notion of a tangent space Tp M at a particular point p ∈ M. [sent-303, score-0.59]

33 The collection of all such vectors formed by all such curves is a well defined vector space (with origin at p), called the tangent space Tp M. [sent-305, score-0.35]

34 In what follows, we will fix an arbitrary point p ∈ M and a tangent vector v ∈ Tp M and analyze how the various steps of the algorithm modify the length of v. [sent-306, score-0.321]

35 Let Φ be the initial (scaled) random projection map (from RD to Rd ) that may contract distances on M by various amounts, and let Ψ be the subsequent correction map that attempts to restore these distances (as defined in Step 2 for Embedding I or as a sequence of maps in Step 7 for Embedding II). [sent-307, score-0.427]

36 The quantity (∇F) p is precisely the matrix representation of this linear “pushforward” map that sends tangent vectors of M (at p) to the corresponding tangent vectors of N (at F(p)). [sent-314, score-0.737]

37 Point p maps to F(p), tangent vector v maps to (DF) p (v). [sent-325, score-0.367]

38 We will conclude by relating tangent vectors to lengths of curves, showing approximate isometry (Section 5. [sent-328, score-0.43]

39 , Milnor, 1972) that almost every smooth mapping of an n-dimensional manifold into R2n+1 is a differential structure preserving embedding of M. [sent-335, score-0.562]

40 In particular, a projection onto a random subspace (of dimension 2n + 1) constitutes such an embedding with probability 1. [sent-336, score-0.339]

41 This translates to stating that a random projection into R2n+1 is enough to guarantee that Φ doesn’t collapse the lengths of non-zero tangent vectors almost surely. [sent-337, score-0.456]

42 However, due to computational issues, we additionally require that the lengths are bounded away from zero (that is, a statement of the form (DΦ) p (v) ≥ Ω(1) v for all v tangent to M at all points p). [sent-338, score-0.332]

43 If d = Ω(n log(CM /τ)), then with probability at least 1 − 1/poly(n) over the choice of the random projection matrix, we have (a) For all p ∈ M and all tangent vectors v ∈ Tp M, (1/2) v ≤ (DΦ) p (v) ≤ (5/6) v . [sent-344, score-0.419]

44 Then, a bound on the length of tangent vectors also gives us a bound on the spectrum of ΦFx (recall the definition of Fx from Section 4). [sent-348, score-0.376]

45 Left: Underlying manifold M ⊂ RD with the quantities of interest—a fixed point p and a fixed unit-vector v tangent to M at p. [sent-350, score-0.559]

46 The point p maps to Φp and the tangent vector v maps to u := (DΦ) p (v) = Φv. [sent-352, score-0.367]

47 To understand the action of Ψ on a tangent vector better, we will first consider a simple case of flat manifolds (Section 5. [sent-366, score-0.338]

48 For concreteness, let F be a D × n matrix whose column vectors form some orthonormal basis of the n-flat manifold (in the original space RD ). [sent-380, score-0.382]

49 Then FV forms an orthonormal basis of the n-flat manifold (in RD ) that maps to an orthogonal basis UΣ of the projected n-flat manifold (in Rd ) via the contraction mapping Φ. [sent-382, score-0.697]

50 The individual terms are given ¯ cos ¯ ¯ cos ¯ cos ¯ ¯ sin Ψsin (t) := (ψsin as ¯ sin ψi (t) := sin((Ct)i ) i = 1, . [sent-391, score-0.946]

51 , n, ¯ cos ψi (t) := cos((Ct)i ) where C is now an n × d correction matrix. [sent-394, score-0.327]

52 It turns out that setting C = (Σ−2 − I)1/2U T precisely restores the contraction caused by Φ to the tangent vectors (notice the similarity between this expression with the correction matrix in the general case Cx in Section 4 and our motivating intuition in Section 2). [sent-395, score-0.518]

53 To see this, let v be a vector tangent to the n-flat at some point p (in RD ). [sent-396, score-0.323]

54 The individual components are given by i i i dt dt t=Φ(p) dt dt t=Φ(p) ¯ sin d ψk (t)/dt i = + cos((Ct)k )Ck,i ¯ cos d ψk (t)/dt i = − sin((Ct)k )Ck,i k = 1, . [sent-404, score-0.764]

55 ¯ In other words, our non-linear correction map Ψ can exactly restore the contraction caused by Φ for any vector tangent to an n-flat manifold. [sent-418, score-0.511]

56 These bump functions, although important for localization, have an undesirable effect on the stretched length of the tangent vector. [sent-455, score-0.37]

57 In this section we assume that the normals η and ν have the following properties: - |ηi, j (t) · v| ≤ ε0 and |νi, j (t) · v| ≤ ε0 for all unit-length v tangent to Ψi, j−1 (ΦM) at Ψi, j−1 (t). [sent-507, score-0.395]

58 Now, as before, representing a tangent vector u = ∑l ul el (such that u 2 ≤ 1) in terms of its basis vectors, it suffices to study how DΨ acts on basis vectors. [sent-510, score-0.347]

59 4 Combined Effect of Ψ(Φ(M)) We can now analyze the aggregate effect of both our embeddings on the length of an arbitrary unit vector v tangent to M at p. [sent-520, score-0.362]

60 So far we have shown that our embedding approximately preserves the length of a fixed tangent vector at a fixed point. [sent-544, score-0.557]

61 Since the choice of the vector and the point was arbitrary, it follows that our embedding approximately preserves the tangent vector lengths throughout the embedded manifold uniformly. [sent-545, score-0.859]

62 We will now show that preserving the tangent vector lengths implies preserving the geodesic curve lengths. [sent-546, score-0.51]

63 By (1 ± ε)isometry of tangent vectors, this immediately gives us (1 − ε)L(γ) ≤ L(¯ ) ≤ (1 + ε)L(γ) for any path γ ¯ γ in M and its image γ in embedding of M. [sent-553, score-0.531]

64 The correction procedure discussed here can also be readily adapted to create isometric embeddings from any manifold embedding procedure (under some mild conditions). [sent-560, score-0.711]

65 Take any off-theshelf manifold embedding algorithm A (such as LLE, Laplacian Eigenmaps, etc. [sent-561, score-0.5]

66 ) that maps an n-dimensional manifold in, say, d dimensions, but does not necessarily guarantee an approximate isometric embedding. [sent-562, score-0.369]

67 We can thus apply the Corrections Stage (either the one discussed in Algorithm I or Algorithm II) to produce an approximate isometric embedding of the given manifold in slightly higher dimensions. [sent-565, score-0.569]

68 In this sense, the correction procedure presented here serves as a universal procedure for approximate isometric manifold embeddings. [sent-566, score-0.434]

69 Lemma 17 (relating nearby tangent vectors—implicit in the proof of Proposition 6. [sent-574, score-0.323]

70 Let u ∈ Tp M be a unit length tangent vector and v ∈ Tq M be its parallel transport along the (shortest) geodesic path to q. [sent-577, score-0.452]

71 Lemma 19 (projection of a section of a manifold onto the tangent space) Pick any p ∈ M and define M p,r := {q ∈ M : q − p ≤ r}. [sent-582, score-0.593]

72 Let f denote the orthogonal linear projection of M p,r onto the tangent space Tp M. [sent-583, score-0.427]

73 Technically, it is not possible to directly compare two vectors that reside in different tangent spaces. [sent-607, score-0.35]

74 However, since we only deal with manifolds that are immersed in some ambient space, we can treat the tangent spaces as n-dimensional affine subspaces. [sent-608, score-0.378]

75 2433 V ERMA c θ q′ q τ p v Tp M Figure 6: Plane spanned by vectors q− p and v ∈ Tp M (where v is the projection of q− p onto Tp M), with τ-balls tangent to p. [sent-612, score-0.453]

76 Lemma 21 (relating nearby manifold points to tangent vectors) Pick any point p ∈ M and let q ∈ M (distinct from p) be such that DG (p, q) ≤ τ. [sent-616, score-0.615]

77 Using the trigonometric identity cos θ = 2 cos2 θ − 1, and noting q − p 2 ≥ q′ − p 2 , 2 we have v q− p · v q− p = cos α ≥ cos θ ≥ 2 Now, by applying the cosine rule, we have 1 − q − p 2 /4τ2 ≥ 1 − (DG (p, q)/2τ)2 . [sent-628, score-0.762]

78 Lemma 22 (approximating tangent space by nearby samples) Let 0 < δ ≤ 1. [sent-631, score-0.323]

79 2435 V ERMA Define the bounded manifold cover as (c) −1 fc (xi ). [sent-676, score-0.406]

80 Now, for 1 ≤ i ≤ n, noting −1 (c) −1 (c) −1 (c) −1 (c) that DG ( fc (xi ), fc (x0 )) ≤ 2 fc (xi ) − fc (x0 ) ≤ ρ (cf. [sent-688, score-0.483]

81 Lemma 18), we have that for (c) (c) (c) vi − vi ˆ (c) (c) (c) f −1 (x )− f −1 (x ) (c) the vector vi ˆ (c) x −x (c) (c) := c−1 i(c) c−1 0 and its (normalized) projection vi := i(c) 0 onto Tc M, (c) (c) fc (xi )− fc (x0 ) xi −x0 √ ≤ ρ/ 2τ (cf. [sent-689, score-0.555]

82 Let ux0 be the projection ˆ of u onto Tx0 M, and θ1 be the angle between vectors u and ux0 , and let θ2 be the angle between ˆ ˆ vectors ux0 (at x0 ) and its parallel transport along the geodesic path to p (see Figure 7). [sent-710, score-0.456]

83 If cos α ≥ 1 − ε1 and cos β ≥ 1 − ε2 , then cos(α + β) ≥ 1 − ε1 − ε2 − √ 2 ε1 ε2 . [sent-723, score-0.452]

84 √ √ √ Proof Applying the identity sin θ = 1 − cos2 θ immediately yields sin α ≤ 2ε1 and sin β ≤ 2ε2 . [sent-724, score-0.402]

85 √ √ Now, cos(α + β) = cos α cos β − sin α sin β ≥ (1 − ε1 )(1 − ε2 ) − 2 ε1 ε2 ≥ 1 − ε1 − ε2 − 2 ε1 ε2 . [sent-725, score-0.72]

86 Finally, for any point p ∈ M, a unit vector u tangent to M at p can be approximated arbitrarily well by considering a sequence {pi }i of points (in M) converging to p (in M) such that (pi − p)/ pi − p converges to u. [sent-747, score-0.323]

87 Definition 3), there exists a unit length vector vn tangent to M (at x) such that |Fx vn · vn | ≥ 1 − δ. [sent-754, score-0.549]

88 Noting that i) the terms |Ak,x (t)| and |Ak,x (t)| are at most O(α16n d/ρ) cos sin (see Lemma 27), ii) |(Cx u)k | ≤ 4, and iii) ΛΦ(x) (t) ≤ 1, we can pick ω sufficiently large (say, √ ω ≥ Ω(nα2 16n d/ρε) such that |ζ| ≤ ε/2 (where ε is the isometry constant from our main theorem). [sent-762, score-0.449]

89 cos sin k,x Proof We shall focus on bounding |Asin (t)| (the steps for bounding |Ak,x (t)| are similar). [sent-764, score-0.393]

90 Note that cos |Ak,x (t)| sin 1/2 d x ∑ ui sin(ω(C t)k ) = dΛΦ(x) (t) i=1 dt i d ≤ ∑ |ui | · i=1 1/2 dΛΦ(x) (t) dt i 1/2 dΛΦ(x) (t) d ≤ ∑ dt i i=1 2 , √ k,x since u ≤ 1. [sent-765, score-0.701]

91 Pick any x ∈ M and let Fx be any n-dimensional affine space with the property: for any unit vector vx tangent to M at x, and its projection vxF onto Fx , vx · vxF ≥ 1 − δ. [sent-851, score-0.532]

92 vr 2 ≤ 2ξ, ≥ 1 − ξ, where vF is the projection of v onto Fx and vr is the residual (i. [sent-855, score-0.341]

93 Let vx (at x) be the parallel transport of v (at p) via the (shortest) geodesic path via the manifold connection. [sent-860, score-0.448]

94 We bound v the individual terms cos α and cos β as follows. [sent-865, score-0.452]

95 Also note since 1 = v and vF 2 = 1 − vr 2 2 = (v · vF vF )2 vF vF 2 + vr 2 , we have vr ≥ 1 − 2ξ. [sent-869, score-0.357]

96 Computing the Normal Vectors The success of the second embedding technique crucially depends upon finding (at each iteration step) a pair of mutually orthogonal unit vectors that are normal to the embedding of manifold M (from the previous iteration step) at a given point p. [sent-892, score-0.845]

97 The saving grace comes from noting that the corrections are applied to the n-dimensional manifold Φ(M) that is actually a submanifold of d-dimensional space Rd . [sent-894, score-0.392]

98 Of course, to find two mutually orthogonal normals to a d-manifold N, N itself needs to be embedded in a larger dimensional Euclidean space (although embedding into d + 2 should suffice, for computational reasons we will embed N into Euclidean space of dimension 2d + 3). [sent-898, score-0.426]

99 ), we can treat the point Φp as part of the bigger ambient manifold N (= Rd , that contains ΦM) and compute the desired normals in a space that contains N itself. [sent-908, score-0.404]

100 , Milnor, 1972), and is the key observation in reducing the embedding size in Whitney’s embedding (Whitney, 1936). [sent-935, score-0.472]


similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('vf', 0.446), ('tangent', 0.295), ('manifold', 0.264), ('embedding', 0.236), ('fx', 0.232), ('cx', 0.227), ('cos', 0.226), ('erma', 0.138), ('sin', 0.134), ('anifolds', 0.13), ('cxi', 0.13), ('istance', 0.13), ('reserving', 0.13), ('rd', 0.129), ('tp', 0.127), ('vr', 0.119), ('cxt', 0.111), ('mbeddings', 0.111), ('fc', 0.106), ('dg', 0.106), ('geodesic', 0.106), ('correction', 0.101), ('dt', 0.101), ('normals', 0.1), ('contracted', 0.089), ('nash', 0.081), ('vn', 0.076), ('spiral', 0.073), ('isometric', 0.069), ('corrections', 0.069), ('projection', 0.069), ('interpoint', 0.062), ('vi', 0.06), ('ct', 0.06), ('noting', 0.059), ('vxf', 0.057), ('lemma', 0.057), ('vectors', 0.055), ('vx', 0.053), ('ul', 0.052), ('bump', 0.049), ('cm', 0.046), ('pick', 0.046), ('isometry', 0.043), ('stage', 0.043), ('contraction', 0.043), ('sv', 0.043), ('manifolds', 0.043), ('distances', 0.042), ('angle', 0.042), ('pushforward', 0.042), ('rand', 0.041), ('embeddings', 0.041), ('interference', 0.041), ('wlog', 0.041), ('ambient', 0.04), ('ui', 0.038), ('map', 0.037), ('lengths', 0.037), ('cover', 0.036), ('maps', 0.036), ('au', 0.036), ('preserving', 0.036), ('singular', 0.036), ('orthonormal', 0.035), ('stretch', 0.035), ('directional', 0.035), ('restore', 0.035), ('ux', 0.035), ('onto', 0.034), ('ki', 0.034), ('fv', 0.034), ('curvature', 0.034), ('embed', 0.034), ('shall', 0.033), ('bsin', 0.032), ('pcq', 0.032), ('stretches', 0.032), ('df', 0.031), ('ii', 0.029), ('orthogonal', 0.029), ('pi', 0.028), ('let', 0.028), ('clarkson', 0.028), ('nearby', 0.028), ('ei', 0.027), ('niyogi', 0.027), ('embedded', 0.027), ('mapping', 0.026), ('length', 0.026), ('sx', 0.025), ('tc', 0.025), ('applying', 0.025), ('transport', 0.025), ('normal', 0.025), ('balls', 0.024), ('asin', 0.024), ('giesen', 0.024), ('orthogonalization', 0.024), ('restores', 0.024)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 1.0000001 34 jmlr-2013-Distance Preserving Embeddings for General n-Dimensional Manifolds

Author: Nakul Verma

Abstract: Low dimensional embeddings of manifold data have gained popularity in the last decade. However, a systematic finite sample analysis of manifold embedding algorithms largely eludes researchers. Here we present two algorithms that embed a general n-dimensional manifold into Rd (where d only depends on some key manifold properties such as its intrinsic dimension, volume and curvature) that guarantee to approximately preserve all interpoint geodesic distances. Keywords: manifold learning, isometric embeddings, non-linear dimensionality reduction, Nash’s embedding theorem

2 0.28458217 86 jmlr-2013-Parallel Vector Field Embedding

Author: Binbin Lin, Xiaofei He, Chiyuan Zhang, Ming Ji

Abstract: We propose a novel local isometry based dimensionality reduction method from the perspective of vector fields, which is called parallel vector field embedding (PFE). We first give a discussion on local isometry and global isometry to show the intrinsic connection between parallel vector fields and isometry. The problem of finding an isometry turns out to be equivalent to finding orthonormal parallel vector fields on the data manifold. Therefore, we first find orthonormal parallel vector fields by solving a variational problem on the manifold. Then each embedding function can be obtained by requiring its gradient field to be as close to the corresponding parallel vector field as possible. Theoretical results show that our method can precisely recover the manifold if it is isometric to a connected open subset of Euclidean space. Both synthetic and real data examples demonstrate the effectiveness of our method even if there is heavy noise and high curvature. Keywords: manifold learning, isometry, vector field, covariant derivative, out-of-sample extension

3 0.20445462 69 jmlr-2013-Manifold Regularization and Semi-supervised Learning: Some Theoretical Analyses

Author: Partha Niyogi

Abstract: Manifold regularization (Belkin et al., 2006) is a geometrically motivated framework for machine learning within which several semi-supervised algorithms have been constructed. Here we try to provide some theoretical understanding of this approach. Our main result is to expose the natural structure of a class of problems on which manifold regularization methods are helpful. We show that for such problems, no supervised learner can learn effectively. On the other hand, a manifold based learner (that knows the manifold or “learns” it from unlabeled examples) can learn with relatively few labeled examples. Our analysis follows a minimax style with an emphasis on finite sample results (in terms of n: the number of labeled examples). These results allow us to properly interpret manifold regularization and related spectral and geometric algorithms in terms of their potential use in semi-supervised learning. Keywords: semi-supervised learning, manifold regularization, graph Laplacian, minimax rates

4 0.12092671 96 jmlr-2013-Regularization-Free Principal Curve Estimation

Author: Samuel Gerber, Ross Whitaker

Abstract: Principal curves and manifolds provide a framework to formulate manifold learning within a statistical context. Principal curves define the notion of a curve passing through the middle of a distribution. While the intuition is clear, the formal definition leads to some technical and practical difficulties. In particular, principal curves are saddle points of the mean-squared projection distance, which poses severe challenges for estimation and model selection. This paper demonstrates that the difficulties in model selection associated with the saddle point property of principal curves are intrinsically tied to the minimization of the mean-squared projection distance. We introduce a new objective function, facilitated through a modification of the principal curve estimation approach, for which all critical points are principal curves and minima. Thus, the new formulation removes the fundamental issue for model selection in principal curve estimation. A gradient-descent-based estimator demonstrates the effectiveness of the new formulation for controlling model complexity on numerical experiments with synthetic and real data. Keywords: principal curve, manifold estimation, unsupervised learning, model complexity, model selection

5 0.11312951 36 jmlr-2013-Distributions of Angles in Random Packing on Spheres

Author: Tony Cai, Jianqing Fan, Tiefeng Jiang

Abstract: This paper studies the asymptotic behaviors of the pairwise angles among n randomly and uniformly distributed unit vectors in R p as the number of points n → ∞, while the dimension p is either fixed or growing with n. For both settings, we derive the limiting empirical distribution of the random angles and the limiting distributions of the extreme angles. The results reveal interesting differences in the two settings and provide a precise characterization of the folklore that “all high-dimensional random vectors are almost always nearly orthogonal to each other”. Applications to statistics and machine learning and connections with some open problems in physics and mathematics are also discussed. Keywords: random angle, uniform distribution on sphere, empirical law, maximum of random variables, minimum of random variables, extreme-value distribution, packing on sphere

6 0.10458369 59 jmlr-2013-Large-scale SVD and Manifold Learning

7 0.074345857 77 jmlr-2013-On the Convergence of Maximum Variance Unfolding

8 0.069434039 112 jmlr-2013-Tapkee: An Efficient Dimension Reduction Library

9 0.053897981 101 jmlr-2013-Sparse Activity and Sparse Connectivity in Supervised Learning

10 0.049156569 105 jmlr-2013-Sparsity Regret Bounds for Individual Sequences in Online Linear Regression

11 0.045752682 109 jmlr-2013-Stress Functions for Nonlinear Dimension Reduction, Proximity Analysis, and Graph Drawing

12 0.044283811 90 jmlr-2013-Quasi-Newton Method: A New Direction

13 0.043882262 39 jmlr-2013-Efficient Active Learning of Halfspaces: An Aggressive Approach

14 0.042270228 35 jmlr-2013-Distribution-Dependent Sample Complexity of Large Margin Learning

15 0.039061025 50 jmlr-2013-Greedy Feature Selection for Subspace Clustering

16 0.03869009 88 jmlr-2013-Perturbative Corrections for Approximate Inference in Gaussian Latent Variable Models

17 0.035194803 44 jmlr-2013-Finding Optimal Bayesian Networks Using Precedence Constraints

18 0.034870323 78 jmlr-2013-On the Learnability of Shuffle Ideals

19 0.031925913 114 jmlr-2013-The Rate of Convergence of AdaBoost

20 0.031029191 64 jmlr-2013-Lovasz theta function, SVMs and Finding Dense Subgraphs


similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, -0.218), (1, 0.197), (2, 0.084), (3, -0.438), (4, -0.141), (5, 0.011), (6, 0.003), (7, -0.097), (8, -0.039), (9, 0.035), (10, -0.271), (11, 0.038), (12, -0.106), (13, -0.008), (14, 0.051), (15, -0.05), (16, -0.005), (17, -0.045), (18, -0.048), (19, -0.053), (20, 0.066), (21, 0.027), (22, -0.017), (23, 0.029), (24, -0.005), (25, 0.041), (26, -0.034), (27, 0.046), (28, -0.005), (29, 0.017), (30, 0.101), (31, 0.071), (32, -0.022), (33, 0.053), (34, 0.045), (35, 0.077), (36, 0.027), (37, 0.121), (38, -0.015), (39, 0.055), (40, -0.039), (41, 0.024), (42, 0.035), (43, 0.004), (44, -0.103), (45, -0.047), (46, 0.005), (47, 0.055), (48, -0.048), (49, -0.004)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.95785189 34 jmlr-2013-Distance Preserving Embeddings for General n-Dimensional Manifolds

Author: Nakul Verma

Abstract: Low dimensional embeddings of manifold data have gained popularity in the last decade. However, a systematic finite sample analysis of manifold embedding algorithms largely eludes researchers. Here we present two algorithms that embed a general n-dimensional manifold into Rd (where d only depends on some key manifold properties such as its intrinsic dimension, volume and curvature) that guarantee to approximately preserve all interpoint geodesic distances. Keywords: manifold learning, isometric embeddings, non-linear dimensionality reduction, Nash’s embedding theorem

2 0.84513384 86 jmlr-2013-Parallel Vector Field Embedding

Author: Binbin Lin, Xiaofei He, Chiyuan Zhang, Ming Ji

Abstract: We propose a novel local isometry based dimensionality reduction method from the perspective of vector fields, which is called parallel vector field embedding (PFE). We first give a discussion on local isometry and global isometry to show the intrinsic connection between parallel vector fields and isometry. The problem of finding an isometry turns out to be equivalent to finding orthonormal parallel vector fields on the data manifold. Therefore, we first find orthonormal parallel vector fields by solving a variational problem on the manifold. Then each embedding function can be obtained by requiring its gradient field to be as close to the corresponding parallel vector field as possible. Theoretical results show that our method can precisely recover the manifold if it is isometric to a connected open subset of Euclidean space. Both synthetic and real data examples demonstrate the effectiveness of our method even if there is heavy noise and high curvature. Keywords: manifold learning, isometry, vector field, covariant derivative, out-of-sample extension

3 0.73442155 69 jmlr-2013-Manifold Regularization and Semi-supervised Learning: Some Theoretical Analyses

Author: Partha Niyogi

Abstract: Manifold regularization (Belkin et al., 2006) is a geometrically motivated framework for machine learning within which several semi-supervised algorithms have been constructed. Here we try to provide some theoretical understanding of this approach. Our main result is to expose the natural structure of a class of problems on which manifold regularization methods are helpful. We show that for such problems, no supervised learner can learn effectively. On the other hand, a manifold based learner (that knows the manifold or “learns” it from unlabeled examples) can learn with relatively few labeled examples. Our analysis follows a minimax style with an emphasis on finite sample results (in terms of n: the number of labeled examples). These results allow us to properly interpret manifold regularization and related spectral and geometric algorithms in terms of their potential use in semi-supervised learning. Keywords: semi-supervised learning, manifold regularization, graph Laplacian, minimax rates

4 0.44723913 96 jmlr-2013-Regularization-Free Principal Curve Estimation

Author: Samuel Gerber, Ross Whitaker

Abstract: Principal curves and manifolds provide a framework to formulate manifold learning within a statistical context. Principal curves define the notion of a curve passing through the middle of a distribution. While the intuition is clear, the formal definition leads to some technical and practical difficulties. In particular, principal curves are saddle points of the mean-squared projection distance, which poses severe challenges for estimation and model selection. This paper demonstrates that the difficulties in model selection associated with the saddle point property of principal curves are intrinsically tied to the minimization of the mean-squared projection distance. We introduce a new objective function, facilitated through a modification of the principal curve estimation approach, for which all critical points are principal curves and minima. Thus, the new formulation removes the fundamental issue for model selection in principal curve estimation. A gradient-descent-based estimator demonstrates the effectiveness of the new formulation for controlling model complexity on numerical experiments with synthetic and real data. Keywords: principal curve, manifold estimation, unsupervised learning, model complexity, model selection

5 0.36020648 36 jmlr-2013-Distributions of Angles in Random Packing on Spheres

Author: Tony Cai, Jianqing Fan, Tiefeng Jiang

Abstract: This paper studies the asymptotic behaviors of the pairwise angles among n randomly and uniformly distributed unit vectors in R p as the number of points n → ∞, while the dimension p is either fixed or growing with n. For both settings, we derive the limiting empirical distribution of the random angles and the limiting distributions of the extreme angles. The results reveal interesting differences in the two settings and provide a precise characterization of the folklore that “all high-dimensional random vectors are almost always nearly orthogonal to each other”. Applications to statistics and machine learning and connections with some open problems in physics and mathematics are also discussed. Keywords: random angle, uniform distribution on sphere, empirical law, maximum of random variables, minimum of random variables, extreme-value distribution, packing on sphere

6 0.35385171 77 jmlr-2013-On the Convergence of Maximum Variance Unfolding

7 0.33528113 59 jmlr-2013-Large-scale SVD and Manifold Learning

8 0.32950595 112 jmlr-2013-Tapkee: An Efficient Dimension Reduction Library

9 0.25924703 35 jmlr-2013-Distribution-Dependent Sample Complexity of Large Margin Learning

10 0.22732551 39 jmlr-2013-Efficient Active Learning of Halfspaces: An Aggressive Approach

11 0.21169415 109 jmlr-2013-Stress Functions for Nonlinear Dimension Reduction, Proximity Analysis, and Graph Drawing

12 0.20561063 50 jmlr-2013-Greedy Feature Selection for Subspace Clustering

13 0.20376319 90 jmlr-2013-Quasi-Newton Method: A New Direction

14 0.20278366 10 jmlr-2013-Algorithms and Hardness Results for Parallel Large Margin Learning

15 0.19908895 44 jmlr-2013-Finding Optimal Bayesian Networks Using Precedence Constraints

16 0.17861521 18 jmlr-2013-Beyond Fano's Inequality: Bounds on the Optimal F-Score, BER, and Cost-Sensitive Risk and Their Implications

17 0.17458595 72 jmlr-2013-Multi-Stage Multi-Task Feature Learning

18 0.17129794 119 jmlr-2013-Variable Selection in High-Dimension with Random Designs and Orthogonal Matching Pursuit

19 0.16522926 101 jmlr-2013-Sparse Activity and Sparse Connectivity in Supervised Learning

20 0.1649422 51 jmlr-2013-Greedy Sparsity-Constrained Optimization


similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(0, 0.035), (5, 0.097), (6, 0.032), (10, 0.109), (14, 0.011), (20, 0.012), (23, 0.027), (46, 0.018), (57, 0.403), (68, 0.018), (70, 0.016), (75, 0.045), (85, 0.042), (87, 0.021), (89, 0.016)]

similar papers list:

simIndex simValue paperId paperTitle

1 0.8596409 55 jmlr-2013-Joint Harmonic Functions and Their Supervised Connections

Author: Mark Vere Culp, Kenneth Joseph Ryan

Abstract: The cluster assumption had a significant impact on the reasoning behind semi-supervised classification methods in graph-based learning. The literature includes numerous applications where harmonic functions provided estimates that conformed to data satisfying this well-known assumption, but the relationship between this assumption and harmonic functions is not as well-understood theoretically. We investigate these matters from the perspective of supervised kernel classification and provide concrete answers to two fundamental questions. (i) Under what conditions do semisupervised harmonic approaches satisfy this assumption? (ii) If such an assumption is satisfied then why precisely would an observation sacrifice its own supervised estimate in favor of the cluster? First, a harmonic function is guaranteed to assign labels to data in harmony with the cluster assumption if a specific condition on the boundary of the harmonic function is satisfied. Second, it is shown that any harmonic function estimate within the interior is a probability weighted average of supervised estimates, where the weight is focused on supervised kernel estimates near labeled cases. We demonstrate that the uniqueness criterion for harmonic estimators is sensitive when the graph is sparse or the size of the boundary is relatively small. This sets the stage for a third contribution, a new regularized joint harmonic function for semi-supervised learning based on a joint optimization criterion. Mathematical properties of this estimator, such as its uniqueness even when the graph is sparse or the size of the boundary is relatively small, are proven. A main selling point is its ability to operate in circumstances where the cluster assumption may not be fully satisfied on real data by compromising between the purely harmonic and purely supervised estimators. The competitive stature of the new regularized joint harmonic approach is established. Keywords: harmonic function, joint training, cluster assumption, s

same-paper 2 0.70441782 34 jmlr-2013-Distance Preserving Embeddings for General n-Dimensional Manifolds

Author: Nakul Verma

Abstract: Low dimensional embeddings of manifold data have gained popularity in the last decade. However, a systematic finite sample analysis of manifold embedding algorithms largely eludes researchers. Here we present two algorithms that embed a general n-dimensional manifold into Rd (where d only depends on some key manifold properties such as its intrinsic dimension, volume and curvature) that guarantee to approximately preserve all interpoint geodesic distances. Keywords: manifold learning, isometric embeddings, non-linear dimensionality reduction, Nash’s embedding theorem

3 0.37550667 77 jmlr-2013-On the Convergence of Maximum Variance Unfolding

Author: Ery Arias-Castro, Bruno Pelletier

Abstract: Maximum Variance Unfolding is one of the main methods for (nonlinear) dimensionality reduction. We study its large sample limit, providing specific rates of convergence under standard assumptions. We find that it is consistent when the underlying submanifold is isometric to a convex subset, and we provide some simple examples where it fails to be consistent. Keywords: maximum variance unfolding, isometric embedding, U-processes, empirical processes, proximity graphs.

4 0.36269152 86 jmlr-2013-Parallel Vector Field Embedding

Author: Binbin Lin, Xiaofei He, Chiyuan Zhang, Ming Ji

Abstract: We propose a novel local isometry based dimensionality reduction method from the perspective of vector fields, which is called parallel vector field embedding (PFE). We first give a discussion on local isometry and global isometry to show the intrinsic connection between parallel vector fields and isometry. The problem of finding an isometry turns out to be equivalent to finding orthonormal parallel vector fields on the data manifold. Therefore, we first find orthonormal parallel vector fields by solving a variational problem on the manifold. Then each embedding function can be obtained by requiring its gradient field to be as close to the corresponding parallel vector field as possible. Theoretical results show that our method can precisely recover the manifold if it is isometric to a connected open subset of Euclidean space. Both synthetic and real data examples demonstrate the effectiveness of our method even if there is heavy noise and high curvature. Keywords: manifold learning, isometry, vector field, covariant derivative, out-of-sample extension

5 0.36060938 70 jmlr-2013-Maximum Volume Clustering: A New Discriminative Clustering Approach

Author: Gang Niu, Bo Dai, Lin Shang, Masashi Sugiyama

Abstract: The large volume principle proposed by Vladimir Vapnik, which advocates that hypotheses lying in an equivalence class with a larger volume are more preferable, is a useful alternative to the large margin principle. In this paper, we introduce a new discriminative clustering model based on the large volume principle called maximum volume clustering (MVC), and then propose two approximation schemes to solve this MVC model: A soft-label MVC method using sequential quadratic programming and a hard-label MVC method using semi-definite programming, respectively. The proposed MVC is theoretically advantageous for three reasons. The optimization involved in hardlabel MVC is convex, and under mild conditions, the optimization involved in soft-label MVC is akin to a convex one in terms of the resulting clusters. Secondly, the soft-label MVC method pos∗. A preliminary and shorter version has appeared in Proceedings of 14th International Conference on Artificial Intelligence and Statistics (Niu et al., 2011). The preliminary work was done when GN was studying at Department of Computer Science and Technology, Nanjing University, and BD was studying at Institute of Automation, Chinese Academy of Sciences. A Matlab implementation of maximum volume clustering is available from http://sugiyama-www.cs.titech.ac.jp/∼gang/software.html. c 2013 Gang Niu, Bo Dai, Lin Shang and Masashi Sugiyama. N IU , DAI , S HANG AND S UGIYAMA sesses a clustering error bound. Thirdly, MVC includes the optimization problems of a spectral clustering, two relaxed k-means clustering and an information-maximization clustering as special limit cases when its regularization parameter goes to infinity. Experiments on several artificial and benchmark data sets demonstrate that the proposed MVC compares favorably with state-of-the-art clustering methods. Keywords: discriminative clustering, large volume principle, sequential quadratic programming, semi-definite programming, finite sample stability, clustering error

6 0.3560788 28 jmlr-2013-Construction of Approximation Spaces for Reinforcement Learning

7 0.35128072 16 jmlr-2013-Bayesian Nonparametric Hidden Semi-Markov Models

8 0.35030836 52 jmlr-2013-How to Solve Classification and Regression Problems on High-Dimensional Data with a Supervised Extension of Slow Feature Analysis

9 0.34968191 50 jmlr-2013-Greedy Feature Selection for Subspace Clustering

10 0.34883863 59 jmlr-2013-Large-scale SVD and Manifold Learning

11 0.34874609 2 jmlr-2013-A Binary-Classification-Based Metric between Time-Series Distributions and Its Use in Statistical and Learning Problems

12 0.3482497 5 jmlr-2013-A Near-Optimal Algorithm for Differentially-Private Principal Components

13 0.34693307 18 jmlr-2013-Beyond Fano's Inequality: Bounds on the Optimal F-Score, BER, and Cost-Sensitive Risk and Their Implications

14 0.34644878 25 jmlr-2013-Communication-Efficient Algorithms for Statistical Optimization

15 0.34445727 3 jmlr-2013-A Framework for Evaluating Approximation Methods for Gaussian Process Regression

16 0.34338081 99 jmlr-2013-Semi-Supervised Learning Using Greedy Max-Cut

17 0.3431083 22 jmlr-2013-Classifying With Confidence From Incomplete Information

18 0.34289336 26 jmlr-2013-Conjugate Relation between Loss Functions and Uncertainty Sets in Classification Problems

19 0.34289181 72 jmlr-2013-Multi-Stage Multi-Task Feature Learning

20 0.34206793 29 jmlr-2013-Convex and Scalable Weakly Labeled SVMs