jmlr jmlr2013 jmlr2013-77 knowledge-graph by maker-knowledge-mining

77 jmlr-2013-On the Convergence of Maximum Variance Unfolding


Source: pdf

Author: Ery Arias-Castro, Bruno Pelletier

Abstract: Maximum Variance Unfolding is one of the main methods for (nonlinear) dimensionality reduction. We study its large sample limit, providing specific rates of convergence under standard assumptions. We find that it is consistent when the underlying submanifold is isometric to a convex subset, and we provide some simple examples where it fails to be consistent. Keywords: maximum variance unfolding, isometric embedding, U-processes, empirical processes, proximity graphs.

Reference: text


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 We find that it is consistent when the underlying submanifold is isometric to a convex subset, and we provide some simple examples where it fails to be consistent. [sent-7, score-0.25]

2 Keywords: maximum variance unfolding, isometric embedding, U-processes, empirical processes, proximity graphs. [sent-8, score-0.158]

3 , yn ∈ Rd that can be isometrically mapped to (or close to) x1 , . [sent-24, score-0.256]

4 (2000) show that, with proper tuning, geodesic distances may be approximated by neighborhood graph distances when the submanifold M is geodesically convex, implying that Isomap asymptotically recovers the isometry when D is convex. [sent-30, score-0.309]

5 Very close in spirit to what we do here, Zha and Zhang (2007) introduce and study a continuum version of Isomap. [sent-32, score-0.169]

6 A RIAS -C ASTRO AND P ELLETIER recover an isometry when the manifold is isometric to a convex domain in some lower-dimensional Euclidean space. [sent-34, score-0.391]

7 To justify HLLE, Donoho and Grimes (2003) show that the null space of the (continuous) Hessian operator yields an isometric embedding. [sent-35, score-0.131]

8 For instance, a number of papers analyze how well the discrete graph Laplacian based on a sample approximates the continuous Laplace-Beltrami operator on a submanifold (Belkin and Niyogi, 2005; von Luxburg et al. [sent-40, score-0.118]

9 However, such convergence results do not guaranty that the algorithm is successful at recovering the isometry when one exists. [sent-43, score-0.212]

10 In Section 4, we consider the solutions to the continuum limit. [sent-52, score-0.169]

11 , yn ∈ R p : yi − y j ≤ xi − x j when xi − x j ≤ r . [sent-73, score-0.328]

12 In that case, δM (x, x′ ) is the length of the shortest continuous path on M starting at x and ending at x′ , and (M, δM ) is a complete metric space, also called a length space in the context of metric geometry (Burago et al. [sent-84, score-0.117]

13 (2000) prove that it holds when M is a compact, smooth and geodesically convex submanifold (e. [sent-93, score-0.155]

14 , yn ) of Discrete MVU, ˆ ˆ inf max yi − f (xi ) → 0, ˆ f ∈S1 1≤i≤n (6) almost surely as n → ∞. [sent-108, score-0.337]

15 , xn and an edge between xi and x j if xi − x j ≤ r. [sent-119, score-0.098]

16 The latter comes from the fact that, for any f ∈ F1 , E( f ) ≤ M×M δM (x, x′ )2 µ(dx)µ(dx′ ) ≤ diam(M)2 , where we used (2) in the first inequality, and diam(M) is the intrinsic diameter of M, that is, diam(M) := sup δM (x, x′ ). [sent-123, score-0.206]

17 rn → 0 in such a way that ∞ ∑ P (Λ(λn rn )c ) < ∞, for some sequence λn → 0. [sent-136, score-0.174]

18 2 we derive a quantitative bound on rn that guaranty (7) under additional assumptions. [sent-170, score-0.15]

19 Assuming Λn (η) holds, for all k we have δM (xk , xtk ) ≤ η, so that ′ ′ xtk − xtk−1 ≤ δM (xtk , xtk−1 ) ≤ δM (xk , xk−1 ) + 2η ≤ l1 + 2η ≤ r/2 + 2(r/4) = r. [sent-209, score-0.142]

20 For f ∈ F1−c(r) , and for any i, j such that xi − x j ≤ r, we have f (xi ) − f (x j ) ≤ (1 − c(r))δM (xi , x j ) ≤ (1 − c(r))(1 + c( xi − x j )) xi − x j . [sent-230, score-0.108]

21 1752 O N THE C ONVERGENCE OF M AXIMUM VARIANCE U NFOLDING Since the function c is non-decreasing, c( xi − x j ) ≤ c(r), and so f (xi ) − f (x j ) ≤ 1 − c(r)2 xi − x j ≤ xi − x j . [sent-231, score-0.108]

22 Consequently, Yn ( f ) ∈ Yn,r , implying that sup E (Y ) ≥ Y ∈Yn,r sup E (Yn ( f )). [sent-232, score-0.412]

23 (10) f ∈F1−c(r) As a result of (9) and (10), we have sup E (Y ) − sup E ( f ) ≤ f ∈F1 Y ∈Yn,r sup E (Yn ( f )) − sup E ( f ) . [sent-233, score-0.824]

24 sup (11) f ∈F1 1−c(r)≤L≤1+6η/r f ∈FL We have sup E (Yn ( f )) − sup E ( f ) ≤ sup E (Yn ( f )) − E ( f ) , f ∈FL f ∈FL f ∈FL and applying the triangle inequality, we arrive at sup E (Yn ( f )) − sup E ( f ) ≤ sup E (Yn ( f )) − E ( f ) + sup E ( f ) − sup E ( f ) . [sent-234, score-1.882]

25 f ∈FL f ∈FL f ∈F1 f ∈FL f ∈F1 Since FL = LF1 and E (L f ) = L2 E ( f ), we have sup E ( f ) − sup E ( f ) ≤ |L2 − 1| sup E ( f ) ≤ |L2 − 1| diam(M)2 , f ∈F1 f ∈F1 f ∈FL and sup E (Yn ( f )) − E ( f ) = L2 sup E (Yn ( f )) − E ( f ) . [sent-235, score-1.03]

26 f ∈FL (12) f ∈F1 Consequently, sup E (Yn ( f )) − sup E ( f ) ≤ L2 sup E (Yn ( f )) − E ( f ) + |L2 − 1| diam(M)2 . [sent-236, score-0.618]

27 f ∈F1 f ∈FL f ∈F1 Reporting this inequality in (11) on the event Λ(η) with η ≤ r/4, we have sup E (Y ) − sup E ( f ) ≤ (1 + 6η/r)2 sup E (Yn ( f )) − E ( f ) + β(r, η) 2 + β(r, η) diam(M)2 , Y ∈Yn,r f ∈F1 f ∈F1 (13) where β(r, η) := max(c(r), 6η/r). [sent-237, score-0.618]

28 Note that f ∈ F1 if, and only if, f − f (x0 ) ∈ F10 , and by the fact that E ( f + a) = E ( f ) for any function or vector f and any constant a ∈ R p , we have sup E (Yn ( f )) − E ( f ) = sup E (Yn ( f )) − E ( f ) . [sent-247, score-0.412]

29 Since F10 is equicontinuous and bounded, it is compact for the topology of the supremum norm by the Arzel` -Ascoli Theorem, so that N∞ (F10 , ε) < ∞ for any a ε > 0. [sent-254, score-0.111]

30 By (14) and (15), we have |E (Yn ( f )) − E ( f )| ≤ |E (Yn ( f )) − E (Yn ( fk ))| + |E (Yn ( fk )) − E ( fk )| + |E ( fk ) − E ( f )| ≤ 8 diam(M) f − fk ∞ + |E (Yn ( f k )) − E ( f k )| = 8 diam(M)ε + |E (Yn ( fk )) − E ( fk )| . [sent-256, score-0.518]

31 Thus, sup E (Yn ( f )) − E ( f ) ≤ 8 diam(M)ε + max{|E (Yn ( fk )) − E ( fk )| : k = 1, . [sent-257, score-0.354]

32 Therefore, when ε > 0 is fixed, the second term in (16) tends to zero almost surely, and since ε > 0 is arbitrary, we conclude that sup E (Yn ( f )) − E ( f ) → 0, in probability, as n → ∞. [sent-263, score-0.206]

33 5 diam(M)4 + 3 diam(M)2 ε (18) Using (18) in (16), coupled with the union bound, we get that P sup E (Yn ( f )) − E ( f ) > 9ε diam(M) f ∈F1 ≤ N∞ (F10 , ε) · 2 exp − nε2 . [sent-282, score-0.206]

34 5 diam(M)2 + 3ε (19) Clearly, the RHS is summable for every ε > 0 fixed, so the convergence in (17) happens in fact with probability one, that is, sup E (Yn ( f )) − E ( f ) → 0, almost surely, as n → ∞. [sent-283, score-0.239]

35 When Λ(λn rn ) holds, by (13), we have sup E (Y ) − sup E ( f ) ≤ (1 + 6λn )2 sup E (Yn ( f )) − E ( f ) + 3 max c(rn ), 6λn diam(M)2 , Y ∈Yn,r f ∈F1 f ∈F1 while when Λ(λn rn ) does not hold, since the energies are bounded by diam(M)2 , we have sup E (Y ) − sup E ( f ) ≤ 2 diam(M)2 . [sent-286, score-1.204]

36 Y ∈Yn,r f ∈F1 1755 A RIAS -C ASTRO AND P ELLETIER Combining these inequalities, we deduce that sup E (Y ) − sup E ( f ) Y ∈Yn,r f ∈F1 I ≤ 3 max c(rn ), 6λn diam(M)2 1 Λ(λn rn ) +2 diam(M)2 1 Λ(λn rn )c I +(1 + 6λn )2 sup E (Yn ( f )) − E ( f ) . [sent-287, score-0.792]

37 Note that the existence of the interpolating function fˆn holds on Λ(λn rn ) for each fixed n, and that this does not imply the existence of an interpolating sequence ( fˆn )n≥1 . [sent-295, score-0.159]

38 In addition, when rn satisfies the Connectivity requirement, then with probability one, Λ(λn rn )c holds for only finitely many n’s by the Borel-Cantelli lemma, implying that, with probability one, Λ(λn rn ) holds infinitely often. [sent-298, score-0.261]

39 Since F40 is equicontinuous and bounded, it is compact for the topology of the supnorm by the Arzel` -Ascoli Theorem. [sent-300, score-0.109]

40 Indeed, we have E ( fˆn ) = E ( fˆn ) − E (Yn ( fˆn )) + E (Yn ( fˆn )), with E ( fˆn ) − E (Yn ( fˆn )) ≤ sup E (Yn ( f )) − E ( f ) → 0, f ∈F1 by (17), and E (Yn ( fˆn )) = sup E (Y ) → sup E ( f ), f ∈F1 Y ∈Yn,rn by (5), almost surely as n → ∞. [sent-304, score-0.657]

41 Hence, if f∞ = limk fˆnk , by continuity of E on F40 , we have E ( f∞ ) = lim E ( fˆnk ) = sup E ( f ), k f ∈F1 0 and given that f∞ ∈ F10 , we have f∞ ∈ S1 by definition. [sent-305, score-0.206]

42 0 The fact that ( fˆn ) is compact with all accumulation points in S1 implies that inf 0 f ∈ S1 fˆn − f ∞ → 0, (21) and since we have max1≤i≤n yi − f (xi ) = fˆn (xi ) − f (xi ) ≤ fˆn − f ∞ , this immediately implies ˆ (6). [sent-306, score-0.151]

43 1756 O N THE C ONVERGENCE OF M AXIMUM VARIANCE U NFOLDING Lemma 3 Let (an ) be a sequence in a compact metric space with metric δ, that has all its accumulation points in a set A. [sent-308, score-0.199]

44 Quantitative Convergence Bounds We obtained a general, qualitative convergence result for MVU in the preceding section and now specify some of the supporting arguments to obtain quantitative convergence speeds. [sent-315, score-0.101]

45 And thick sets are a model for noisy data, where that the data points are sampled from the vicinity of a submanifold. [sent-328, score-0.135]

46 An important example of thick sets are tubular neighborhoods of thin sets. [sent-329, score-0.242]

47 Note that any thin set A has positive reach, which bounds its radius of curvature from below. [sent-333, score-0.107]

48 While ¯ for any thick set A, ∂A is a thin set without boundary, for any η < ρ(A), B(A, η) is a thick set, with boundary having reach ≥ ρ(A) − η. [sent-334, score-0.377]

49 When M is a thin set, we define rM = / min ρ(M⋆ ), ρ(∂M) , where by convention ρ(0) = ∞. [sent-338, score-0.107]

50 Lemma 4 Whether M is a thin or a thick set, (3) is valid with c(r) = 4r 1 I + 1 {r≥rM /2} . [sent-341, score-0.242]

51 I rM {r 0 such that µ(B(x, η)) ≥ α vold (B(x, η) ∩ M), ∀x ∈ M, ∀η > 0, (23) where vold denotes the d-dimensional Hausdorff measure and d denotes the Hausdorff dimension of M. [sent-342, score-0.226]

52 Lemma 5 Whether M is thin or thick, there is C > 0 such that, for any η ≤ rM and any x ∈ M, vold (B(x, η) ∩ M) ≥ C ηd . [sent-345, score-0.22]

53 We can now reason as we did for thick sets, but with a twist. [sent-374, score-0.135]

54 Arguing exactly as we did for thick sets, we have that B(c, η/8) ∩ T ⊂ B(a, η/2) ∩ π(K). [sent-378, score-0.135]

55 In addition, since π is 1-Lipschitz on K, we have vold (L) ≥ vold (π(L)) = vold (B(c, η/8) ∩ T ). [sent-381, score-0.339]

56 When (23) is satisfied, and M is either thin or thick, we can provide sharp rates for rn . [sent-383, score-0.194]

57 When M is thick, N (M, η) ≤ C vol p (M)η−p ; and when M is thin and 0 ≤ σ < ρ(M), N (B(M, σ), η) ≤ C vold (M) max(σ, η) p−d η−p . [sent-388, score-0.296]

58 Since / B(zi , η/2) ∩ B(z j , η/2) = 0 when i = j, we have vol p (M) ≥ ∑ vol p (B(z j , η/2) ∩ M) ≥ NηCp η p , j where Cp is the constant in Lemma 5. [sent-394, score-0.152]

59 Since B(zi , η/8) ∩ B(z j , η/8) = 0 when i = j, and B(zi , η/8) ⊂ B(M, σ), we have vol p (B(M, σ)) ≥ ∑ vol p (B(z j , η/8)) = Nω p (η/8) p . [sent-406, score-0.152]

60 By Weyl’s volume formula for tubes (Weyl, 1939), we p have vol p (B(M, σ)) ≤ C1 vold (M)σ p−d for a constant C1 depending on p and d. [sent-408, score-0.189]

61 † † We deduce that any rn ≫ rn := (log(n)/n)1/d satisfies (7) with any λn → 0 such that λn ≫ rn /rn . [sent-415, score-0.261]

62 C3 ε 4 s t 0 Consider the class G of piecewise-constant functions g : M → R p of the form g(x) = yt j for all x ∈ Q j ∩ M and such that yt j − ytk ≤ C0 ε when Q j and Qk are adjacent. [sent-451, score-0.169]

63 For each j, let t j be such that f (z j ) − yt j ≤ pε and let g be defined by g(x) = yt j for all x ∈ Q j . [sent-457, score-0.16]

64 By the triangle inequality, (2) and (3), we have √ yt j − ytk f (z j ) − f (zk ) + yt j − f (z j ) + ytk − f (zk ) √ √ ≤ (1 + c( z j − zk )) z j − zk + pε + pε √ √ ≤ (1 + c(2))2 pε + 2 pε ≤ = C0 ε. [sent-459, score-0.334]

65 In particular, if M is thin or thick, we have log N∞ (F10 , η) ≤ Cη−d , by Lemma 6 (applied with σ = 0 with M is thin) and Lemma 7. [sent-463, score-0.107]

66 4 Quantitative Convergence Bound From (19) and Lemma 7, there is a constant C > 0 such that P sup E (Yn ( f )) − E ( f ) > Cn−1/(d+2) f ∈F1 ≤ exp(−n−d/(d+2) ). [sent-466, score-0.206]

67 Theorem 2 Suppose that M is either thin or thick, of dimension d, and that (23) holds. [sent-469, score-0.107]

68 Assume that † rn → 0 such that rn ≫ rn := (log(n)/(α n))1/d and take any an → ∞. [sent-470, score-0.261]

69 Then, with probability one, sup{E (Y ) : Y ∈ Yn,rn } − sup{E ( f ) : f ∈ F1 } ≤ an rn + † rn + n−1/(d+2) , rn for n large enough. [sent-471, score-0.261]

70 We speculate that this convergence rate is not sharp and that the first term in brackets can be 2 replaced by rn . [sent-472, score-0.148]

71 We mostly focus on the case where M is isometric to a Euclidean domain. [sent-480, score-0.131]

72 We assume that M is isometric to a compact, connected domain D ⊂ Rd . [sent-482, score-0.131]

73 Then the canonical inclusion ι of D in Rd is not necessarily an isometry between the metric spaces (D, δD ) and (Rd , · ). [sent-485, score-0.196]

74 We start by showing that, in the case where M is isometric to a convex domain, then MVU recovers this convex domain modulo a rigid transformation, so that MVU is consistent is that case. [sent-493, score-0.304]

75 And second, that when M is isometric to a domain that is not convex, MVU may not recover this domain. [sent-496, score-0.166]

76 Theorem 3 Suppose that M is isometric to a convex subset D ⊂ Rd with isometry mapping ψ : M → D, and that (23) holds. [sent-502, score-0.32]

77 So sup E ( f ) = E (ψ) = f ∈F1 D×D z − z′ 2 (µ ◦ ψ−1 )(dz)(µ ◦ ψ−1 )(dz′ ). [sent-506, score-0.206]

78 Hence ψ ∈ S1 , and since E (ζ ◦ ψ) = E (ψ) for any isometry ζ : R p → R p , {ζ ◦ ψ : ζ ∈ Isom(R p )} ⊂ S1 . [sent-507, score-0.151]

79 Consequently f (x) − f (x′ ) 2 µ(dx)µ(dx′ ) + E( f ) = M×M\U < M×M = U f (x) − f (x′ ) 2 µ(dx)µ(dx′ ) δM (x, x′ )2 µ(dx)µ(dx′ ) sup E ( f ). [sent-513, score-0.206]

80 f ∈F1 So any function f in F1 which is not an isometry onto its image does not belong to S1 . [sent-514, score-0.151]

81 At last, since for any isometry f in S1 , the map f ◦ ψ−1 : R p → R p is an isometry, there exists some isometry ζ ∈ Isom(R p ) such that f = ζ ◦ ψ, and we conclude that {ζ ◦ ψ : ζ ∈ Isom(R p )} = S1 . [sent-515, score-0.302]

82 In conclusion, MVU recovers the isometry when the domain D is convex. [sent-516, score-0.192]

83 Then as σ → 0, we have (24) sup Eσ ( f ) → sup E ( f ), f ∈F1 f ∈F1,σ and sup inf sup inf f (x) − g(z) → 0. [sent-528, score-0.908]

84 0 Consider any sequence σm → 0 with σm < ρ(M) for all m ≥ 1, and let fm ∈ S1,σm . [sent-531, score-0.229]

85 Then for x ∈ Mσm , we have f⋆ (x) − fm (x) ≤ g⋆ (π(x)) − gm (π(x)) + fm (π(x)) − fm (x) ≤ g⋆ − gm ≤ g⋆ − gm ∞+ π(x) − x ∞ + σm , 1764 O N THE C ONVERGENCE OF M AXIMUM VARIANCE U NFOLDING since fm ∈ F1,σm and the segment [π(x), x] ⊂ Mσm . [sent-537, score-1.033]

86 Hence, as functions on Mσm , we have f⋆ (x) − fm (x) ∞ → 0, that is, sup f⋆ (x) − fm (x) → 0. [sent-539, score-0.602]

87 x∈Mσm By (14), again applied to functions on Mσm for a fixed m, we have Eσm ( fm ) − Eσm ( f⋆ ) ≤ ∞ diam(Mσm ) 4 f⋆ (x) − fm (x) ≤ ∞ diam(B(M, ρ(M))) 4 f⋆ (x) − fm (x) → 0, and since f⋆ does not depend on m and is bounded, we also have Eσm ( f⋆ ) → E ( f⋆ ) = E (g⋆ ) ≤ sup E . [sent-540, score-0.8]

88 Then f ∈ F1,σm by composition, so that Eσm ( f ) ≤ sup Eσm . [sent-544, score-0.206]

89 F1,σm On the other hand, Eσm ( f ) → E ( f ) = E (g) = sup E . [sent-545, score-0.206]

90 Hence, 0 there is ε > 0, a sequence σm → 0 and fm ∈ S1,σm such that inf sup inf fm (x) − g(z) ≥ ε, 0 g∈S1 x∈Mσm z∈M 1765 A RIAS -C ASTRO AND P ELLETIER for infinitely many m’s. [sent-551, score-0.686]

91 For each m, let gm be the restriction of fm to M. [sent-553, score-0.298]

92 Following the same arguments, we have f⋆ (x) − fm (x) → 0. [sent-556, score-0.198]

93 sup x∈Mσm 0 We also see that, necessarily, g⋆ ∈ S1 , for otherwise the inequality in (26) would be strict and this would imply that (24) does not hold. [sent-557, score-0.206]

94 Hence sup x∈Mσm f⋆ (x) − fm (x) ≥ sup inf fm (x) − g⋆ (z) ≥ inf sup inf fm (x) − g(z) . [sent-558, score-1.338]

95 If we now let D = M = M for some 0 < σ < σ , we no rigid transformation of R 1,σ σ 0 see that we do not recover D up to a rigid transformation. [sent-574, score-0.178]

96 By the previous calculations and our choice for a, the identity function ψ is not part of S1,0 , since E0 (ψ) = 1 π M0 1 1 1 x 2 dx = F(a) < F(1) = 2 = π π π 1766 M0 φ(x) 2 dx = E0 (φ). [sent-585, score-0.178]

97 Again, there is a numeric constant σ0 > 0 such that, when σ < σ0 , ψ does not maximize the energy Eσ , and we conclude again that if D = M = Mσ , MVU does not recover D up to a rigid transformation. [sent-587, score-0.125]

98 Though we showed that MVU is not always consistent in the sense that it may not recover the domain D up to a rigid transformation, we believe that MVU always flattens the manifold M in this case, meaning that it returns a set S which is a subset of some d-dimensional affine subspace. [sent-595, score-0.127]

99 In Theorem 3, we worked out the case where M is isometric to a convex set. [sent-599, score-0.169]

100 This question is non-trivial even when M is isometric to a circle. [sent-602, score-0.131]


similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('mvu', 0.459), ('diam', 0.4), ('yn', 0.256), ('sup', 0.206), ('fm', 0.198), ('astro', 0.17), ('elletier', 0.17), ('rias', 0.17), ('continuum', 0.169), ('isometry', 0.151), ('nfolding', 0.142), ('thick', 0.135), ('isometric', 0.131), ('aximum', 0.121), ('vold', 0.113), ('thin', 0.107), ('onvergence', 0.101), ('fl', 0.096), ('dx', 0.089), ('rn', 0.087), ('submanifold', 0.081), ('vol', 0.076), ('fk', 0.074), ('xtk', 0.071), ('ytk', 0.071), ('gm', 0.069), ('unfolding', 0.066), ('isom', 0.057), ('rigid', 0.056), ('accumulation', 0.056), ('zha', 0.056), ('dz', 0.056), ('federer', 0.055), ('isomap', 0.054), ('dist', 0.053), ('compact', 0.053), ('lipschitz', 0.053), ('weinberger', 0.052), ('yt', 0.049), ('metric', 0.045), ('gin', 0.044), ('connectivity', 0.043), ('brudnyi', 0.042), ('burago', 0.042), ('grn', 0.042), ('paprotny', 0.042), ('inf', 0.042), ('recovers', 0.041), ('ka', 0.04), ('surely', 0.039), ('convex', 0.038), ('euclidean', 0.037), ('belkin', 0.037), ('discrete', 0.037), ('eigenmaps', 0.037), ('xi', 0.036), ('interpolating', 0.036), ('lip', 0.036), ('bruno', 0.036), ('geodesically', 0.036), ('hlle', 0.036), ('manifold', 0.036), ('bernstein', 0.035), ('quantitative', 0.035), ('dimensionality', 0.035), ('recover', 0.035), ('rd', 0.034), ('segment', 0.034), ('energy', 0.034), ('tan', 0.034), ('zk', 0.033), ('coverings', 0.033), ('convergence', 0.033), ('issn', 0.032), ('let', 0.031), ('lemma', 0.03), ('ky', 0.03), ('reduction', 0.03), ('supremum', 0.03), ('laplacian', 0.029), ('arzel', 0.028), ('equicontinuous', 0.028), ('garcke', 0.028), ('guaranty', 0.028), ('kirszbraun', 0.028), ('lln', 0.028), ('pelletier', 0.028), ('perimeter', 0.028), ('speculate', 0.028), ('supnorm', 0.028), ('packing', 0.028), ('coifman', 0.028), ('zn', 0.028), ('triangle', 0.028), ('shortest', 0.027), ('variance', 0.027), ('ty', 0.027), ('zi', 0.027), ('xn', 0.026), ('rm', 0.026)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.99999934 77 jmlr-2013-On the Convergence of Maximum Variance Unfolding

Author: Ery Arias-Castro, Bruno Pelletier

Abstract: Maximum Variance Unfolding is one of the main methods for (nonlinear) dimensionality reduction. We study its large sample limit, providing specific rates of convergence under standard assumptions. We find that it is consistent when the underlying submanifold is isometric to a convex subset, and we provide some simple examples where it fails to be consistent. Keywords: maximum variance unfolding, isometric embedding, U-processes, empirical processes, proximity graphs.

2 0.18603145 86 jmlr-2013-Parallel Vector Field Embedding

Author: Binbin Lin, Xiaofei He, Chiyuan Zhang, Ming Ji

Abstract: We propose a novel local isometry based dimensionality reduction method from the perspective of vector fields, which is called parallel vector field embedding (PFE). We first give a discussion on local isometry and global isometry to show the intrinsic connection between parallel vector fields and isometry. The problem of finding an isometry turns out to be equivalent to finding orthonormal parallel vector fields on the data manifold. Therefore, we first find orthonormal parallel vector fields by solving a variational problem on the manifold. Then each embedding function can be obtained by requiring its gradient field to be as close to the corresponding parallel vector field as possible. Theoretical results show that our method can precisely recover the manifold if it is isometric to a connected open subset of Euclidean space. Both synthetic and real data examples demonstrate the effectiveness of our method even if there is heavy noise and high curvature. Keywords: manifold learning, isometry, vector field, covariant derivative, out-of-sample extension

3 0.089566328 104 jmlr-2013-Sparse Single-Index Model

Author: Pierre Alquier, Gérard Biau

Abstract: Let (X,Y ) be a random pair taking values in R p × R. In the so-called single-index model, one has Y = f ⋆ (θ⋆T X) +W , where f ⋆ is an unknown univariate measurable function, θ⋆ is an unknown vector in Rd , and W denotes a random noise satisfying E[W |X] = 0. The single-index model is known to offer a flexible way to model a variety of high-dimensional real-world phenomena. However, despite its relative simplicity, this dimension reduction scheme is faced with severe complications as soon as the underlying dimension becomes larger than the number of observations (“p larger than n” paradigm). To circumvent this difficulty, we consider the single-index model estimation problem from a sparsity perspective using a PAC-Bayesian approach. On the theoretical side, we offer a sharp oracle inequality, which is more powerful than the best known oracle inequalities for other common procedures of single-index recovery. The proposed method is implemented by means of the reversible jump Markov chain Monte Carlo technique and its performance is compared with that of standard procedures. Keywords: single-index model, sparsity, regression estimation, PAC-Bayesian, oracle inequality, reversible jump Markov chain Monte Carlo method

4 0.087697975 69 jmlr-2013-Manifold Regularization and Semi-supervised Learning: Some Theoretical Analyses

Author: Partha Niyogi

Abstract: Manifold regularization (Belkin et al., 2006) is a geometrically motivated framework for machine learning within which several semi-supervised algorithms have been constructed. Here we try to provide some theoretical understanding of this approach. Our main result is to expose the natural structure of a class of problems on which manifold regularization methods are helpful. We show that for such problems, no supervised learner can learn effectively. On the other hand, a manifold based learner (that knows the manifold or “learns” it from unlabeled examples) can learn with relatively few labeled examples. Our analysis follows a minimax style with an emphasis on finite sample results (in terms of n: the number of labeled examples). These results allow us to properly interpret manifold regularization and related spectral and geometric algorithms in terms of their potential use in semi-supervised learning. Keywords: semi-supervised learning, manifold regularization, graph Laplacian, minimax rates

5 0.074345857 34 jmlr-2013-Distance Preserving Embeddings for General n-Dimensional Manifolds

Author: Nakul Verma

Abstract: Low dimensional embeddings of manifold data have gained popularity in the last decade. However, a systematic finite sample analysis of manifold embedding algorithms largely eludes researchers. Here we present two algorithms that embed a general n-dimensional manifold into Rd (where d only depends on some key manifold properties such as its intrinsic dimension, volume and curvature) that guarantee to approximately preserve all interpoint geodesic distances. Keywords: manifold learning, isometric embeddings, non-linear dimensionality reduction, Nash’s embedding theorem

6 0.054746836 36 jmlr-2013-Distributions of Angles in Random Packing on Spheres

7 0.054254603 26 jmlr-2013-Conjugate Relation between Loss Functions and Uncertainty Sets in Classification Problems

8 0.05358588 105 jmlr-2013-Sparsity Regret Bounds for Individual Sequences in Online Linear Regression

9 0.049409386 70 jmlr-2013-Maximum Volume Clustering: A New Discriminative Clustering Approach

10 0.049396474 35 jmlr-2013-Distribution-Dependent Sample Complexity of Large Margin Learning

11 0.046748653 114 jmlr-2013-The Rate of Convergence of AdaBoost

12 0.045751814 24 jmlr-2013-Comment on "Robustness and Regularization of Support Vector Machines" by H. Xu et al. (Journal of Machine Learning Research, vol. 10, pp. 1485-1510, 2009)

13 0.044639558 59 jmlr-2013-Large-scale SVD and Manifold Learning

14 0.044432059 88 jmlr-2013-Perturbative Corrections for Approximate Inference in Gaussian Latent Variable Models

15 0.042526618 96 jmlr-2013-Regularization-Free Principal Curve Estimation

16 0.041381821 97 jmlr-2013-Risk Bounds of Learning Processes for Lévy Processes

17 0.040924091 101 jmlr-2013-Sparse Activity and Sparse Connectivity in Supervised Learning

18 0.040739845 65 jmlr-2013-Lower Bounds and Selectivity of Weak-Consistent Policies in Stochastic Multi-Armed Bandit Problem

19 0.040681481 117 jmlr-2013-Universal Consistency of Localized Versions of Regularized Kernel Methods

20 0.040433981 112 jmlr-2013-Tapkee: An Efficient Dimension Reduction Library


similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, -0.202), (1, 0.144), (2, 0.082), (3, -0.174), (4, -0.096), (5, -0.046), (6, -0.01), (7, -0.032), (8, 0.003), (9, -0.054), (10, -0.224), (11, -0.019), (12, -0.014), (13, 0.033), (14, 0.048), (15, -0.029), (16, 0.0), (17, -0.049), (18, 0.083), (19, -0.002), (20, -0.074), (21, 0.196), (22, -0.011), (23, 0.055), (24, -0.106), (25, 0.023), (26, -0.025), (27, -0.111), (28, -0.038), (29, -0.168), (30, -0.164), (31, -0.128), (32, 0.091), (33, 0.062), (34, -0.025), (35, 0.038), (36, -0.008), (37, -0.277), (38, 0.101), (39, 0.099), (40, -0.112), (41, -0.045), (42, -0.136), (43, -0.082), (44, 0.022), (45, 0.04), (46, 0.049), (47, -0.071), (48, 0.068), (49, -0.124)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.93584257 77 jmlr-2013-On the Convergence of Maximum Variance Unfolding

Author: Ery Arias-Castro, Bruno Pelletier

Abstract: Maximum Variance Unfolding is one of the main methods for (nonlinear) dimensionality reduction. We study its large sample limit, providing specific rates of convergence under standard assumptions. We find that it is consistent when the underlying submanifold is isometric to a convex subset, and we provide some simple examples where it fails to be consistent. Keywords: maximum variance unfolding, isometric embedding, U-processes, empirical processes, proximity graphs.

2 0.54332554 86 jmlr-2013-Parallel Vector Field Embedding

Author: Binbin Lin, Xiaofei He, Chiyuan Zhang, Ming Ji

Abstract: We propose a novel local isometry based dimensionality reduction method from the perspective of vector fields, which is called parallel vector field embedding (PFE). We first give a discussion on local isometry and global isometry to show the intrinsic connection between parallel vector fields and isometry. The problem of finding an isometry turns out to be equivalent to finding orthonormal parallel vector fields on the data manifold. Therefore, we first find orthonormal parallel vector fields by solving a variational problem on the manifold. Then each embedding function can be obtained by requiring its gradient field to be as close to the corresponding parallel vector field as possible. Theoretical results show that our method can precisely recover the manifold if it is isometric to a connected open subset of Euclidean space. Both synthetic and real data examples demonstrate the effectiveness of our method even if there is heavy noise and high curvature. Keywords: manifold learning, isometry, vector field, covariant derivative, out-of-sample extension

3 0.4949055 104 jmlr-2013-Sparse Single-Index Model

Author: Pierre Alquier, Gérard Biau

Abstract: Let (X,Y ) be a random pair taking values in R p × R. In the so-called single-index model, one has Y = f ⋆ (θ⋆T X) +W , where f ⋆ is an unknown univariate measurable function, θ⋆ is an unknown vector in Rd , and W denotes a random noise satisfying E[W |X] = 0. The single-index model is known to offer a flexible way to model a variety of high-dimensional real-world phenomena. However, despite its relative simplicity, this dimension reduction scheme is faced with severe complications as soon as the underlying dimension becomes larger than the number of observations (“p larger than n” paradigm). To circumvent this difficulty, we consider the single-index model estimation problem from a sparsity perspective using a PAC-Bayesian approach. On the theoretical side, we offer a sharp oracle inequality, which is more powerful than the best known oracle inequalities for other common procedures of single-index recovery. The proposed method is implemented by means of the reversible jump Markov chain Monte Carlo technique and its performance is compared with that of standard procedures. Keywords: single-index model, sparsity, regression estimation, PAC-Bayesian, oracle inequality, reversible jump Markov chain Monte Carlo method

4 0.40140072 24 jmlr-2013-Comment on "Robustness and Regularization of Support Vector Machines" by H. Xu et al. (Journal of Machine Learning Research, vol. 10, pp. 1485-1510, 2009)

Author: Yahya Forghani, Hadi Sadoghi

Abstract: This paper comments on the published work dealing with robustness and regularization of support vector machines (Journal of Machine Learning Research, Vol. 10, pp. 1485-1510, 2009) by H. Xu et al. They proposed a theorem to show that it is possible to relate robustness in the feature space and robustness in the sample space directly. In this paper, we propose a counter example that rejects their theorem. Keywords: kernel, robustness, support vector machine 1. Comment Firstly, it must be stated that Xu et al. (2009) made a good study of robustness and regularization of support vector machines. They proposed the following theorem to show that it is possible to relate robustness in the feature space and robustness in the sample space directly: Theorem (Xu et al., 2009) Suppose that the kernel function has the form k(x, x′ ) = f ( x − x′ ), with f : R+ → R a decreasing function. Denote by H the RKHS space of k(., .) and φ(.) the corresponding feature mapping. Then we have any x ∈ Rn , w ∈ H and c > 0, sup w, φ(x − δ) = δ ≤c δφ H≤ sup √ 2 f (0)−2 f (c) w, φ(x) − δφ . The following counter example rejects the mentioned theorem. However, this theorem is a standalone result in the appendix of the paper of Xu et al. (2009), which is not used anywhere else in the paper of Xu et al. (2009). Thus, the main result and all other results of Xu et al. (2009) are not affected in any way. Counter example. Let φ(.) be the feature mapping of Gaussian kernel function. We have φ(x) H = 1. Let w = φ(x). Therefore, w, φ(x) = w H , and sup w, φ(x − δ) = w δ ≤c ∗. Also at Islamic Azad University, Mashhad Branch, Mashhad, IRAN. c 2013 Yahya Forghani and Hadi Sadoghi. H. (1) F ORGHANI AND S ADOGHI Moreover, δφ δφ H≤ H≤ w, φ(x) − δφ = sup √ 2 f (0)−2 f (c) w, φ(x) + sup √ 2 f (0)−2 f (c) w H δφ + δφ w H H≤ + w H≤ w, δφ = sup √ 2 f (0)−2 f (c) w, δφ = sup √ 2 f (0)−2 f (c) H 2 f (0) − 2 f (c). (2) According to Equation (1) and (2), and since f is a decreasing function, for any c > 0, we have sup w, φ(x − δ) ≤ δ ≤c δφ H≤ sup √ 2 f (0)−2 f (c) w, φ(x) − δφ . End of counter example. The exact spot that the error has been occurred in the mentioned theorem is Equation (19) of the paper of Xu et al. (2009). There it has been claimed that the image of the RKHS feature mapping is dense, which unfortunately is not true. Indeed, because φ(x), φ(x) = K(0) where K(.) is the kernel function, the image of the feature mapping is in a ball of radius K(0). References Huan Xu, Shie Mannor, and Constantine Caramanis. Robustness and regularization of support vector machines. Journal of Machine Learning Research, 10:1485–1510, 2009. 3494

5 0.38791361 34 jmlr-2013-Distance Preserving Embeddings for General n-Dimensional Manifolds

Author: Nakul Verma

Abstract: Low dimensional embeddings of manifold data have gained popularity in the last decade. However, a systematic finite sample analysis of manifold embedding algorithms largely eludes researchers. Here we present two algorithms that embed a general n-dimensional manifold into Rd (where d only depends on some key manifold properties such as its intrinsic dimension, volume and curvature) that guarantee to approximately preserve all interpoint geodesic distances. Keywords: manifold learning, isometric embeddings, non-linear dimensionality reduction, Nash’s embedding theorem

6 0.35443687 26 jmlr-2013-Conjugate Relation between Loss Functions and Uncertainty Sets in Classification Problems

7 0.34342158 69 jmlr-2013-Manifold Regularization and Semi-supervised Learning: Some Theoretical Analyses

8 0.30928683 60 jmlr-2013-Learning Bilinear Model for Matching Queries and Documents

9 0.2942557 112 jmlr-2013-Tapkee: An Efficient Dimension Reduction Library

10 0.26492178 67 jmlr-2013-MLPACK: A Scalable C++ Machine Learning Library

11 0.26411679 62 jmlr-2013-Learning Theory Approach to Minimum Error Entropy Criterion

12 0.24625716 2 jmlr-2013-A Binary-Classification-Based Metric between Time-Series Distributions and Its Use in Statistical and Learning Problems

13 0.24555551 70 jmlr-2013-Maximum Volume Clustering: A New Discriminative Clustering Approach

14 0.24176644 101 jmlr-2013-Sparse Activity and Sparse Connectivity in Supervised Learning

15 0.2346632 117 jmlr-2013-Universal Consistency of Localized Versions of Regularized Kernel Methods

16 0.23445687 81 jmlr-2013-Optimal Discovery with Probabilistic Expert Advice: Finite Time Analysis and Macroscopic Optimality

17 0.23181413 97 jmlr-2013-Risk Bounds of Learning Processes for Lévy Processes

18 0.21927017 36 jmlr-2013-Distributions of Angles in Random Packing on Spheres

19 0.20959893 114 jmlr-2013-The Rate of Convergence of AdaBoost

20 0.20429517 65 jmlr-2013-Lower Bounds and Selectivity of Weak-Consistent Policies in Stochastic Multi-Armed Bandit Problem


similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(0, 0.015), (5, 0.091), (6, 0.021), (9, 0.015), (10, 0.39), (20, 0.011), (23, 0.042), (46, 0.036), (53, 0.025), (68, 0.089), (70, 0.036), (75, 0.027), (85, 0.033), (87, 0.042), (89, 0.013), (94, 0.013), (95, 0.01)]

similar papers list:

simIndex simValue paperId paperTitle

1 0.96825367 96 jmlr-2013-Regularization-Free Principal Curve Estimation

Author: Samuel Gerber, Ross Whitaker

Abstract: Principal curves and manifolds provide a framework to formulate manifold learning within a statistical context. Principal curves define the notion of a curve passing through the middle of a distribution. While the intuition is clear, the formal definition leads to some technical and practical difficulties. In particular, principal curves are saddle points of the mean-squared projection distance, which poses severe challenges for estimation and model selection. This paper demonstrates that the difficulties in model selection associated with the saddle point property of principal curves are intrinsically tied to the minimization of the mean-squared projection distance. We introduce a new objective function, facilitated through a modification of the principal curve estimation approach, for which all critical points are principal curves and minima. Thus, the new formulation removes the fundamental issue for model selection in principal curve estimation. A gradient-descent-based estimator demonstrates the effectiveness of the new formulation for controlling model complexity on numerical experiments with synthetic and real data. Keywords: principal curve, manifold estimation, unsupervised learning, model complexity, model selection

2 0.9470669 16 jmlr-2013-Bayesian Nonparametric Hidden Semi-Markov Models

Author: Matthew J. Johnson, Alan S. Willsky

Abstract: There is much interest in the Hierarchical Dirichlet Process Hidden Markov Model (HDP-HMM) as a natural Bayesian nonparametric extension of the ubiquitous Hidden Markov Model for learning from sequential and time-series data. However, in many settings the HDP-HMM’s strict Markovian constraints are undesirable, particularly if we wish to learn or encode non-geometric state durations. We can extend the HDP-HMM to capture such structure by drawing upon explicit-duration semi-Markov modeling, which has been developed mainly in the parametric non-Bayesian setting, to allow construction of highly interpretable models that admit natural prior information on state durations. In this paper we introduce the explicit-duration Hierarchical Dirichlet Process Hidden semiMarkov Model (HDP-HSMM) and develop sampling algorithms for efficient posterior inference. The methods we introduce also provide new methods for sampling inference in the finite Bayesian HSMM. Our modular Gibbs sampling methods can be embedded in samplers for larger hierarchical Bayesian models, adding semi-Markov chain modeling as another tool in the Bayesian inference toolbox. We demonstrate the utility of the HDP-HSMM and our inference methods on both synthetic and real experiments. Keywords: Bayesian nonparametrics, time series, semi-Markov, sampling algorithms, Hierarchical Dirichlet Process Hidden Markov Model

3 0.93863857 70 jmlr-2013-Maximum Volume Clustering: A New Discriminative Clustering Approach

Author: Gang Niu, Bo Dai, Lin Shang, Masashi Sugiyama

Abstract: The large volume principle proposed by Vladimir Vapnik, which advocates that hypotheses lying in an equivalence class with a larger volume are more preferable, is a useful alternative to the large margin principle. In this paper, we introduce a new discriminative clustering model based on the large volume principle called maximum volume clustering (MVC), and then propose two approximation schemes to solve this MVC model: A soft-label MVC method using sequential quadratic programming and a hard-label MVC method using semi-definite programming, respectively. The proposed MVC is theoretically advantageous for three reasons. The optimization involved in hardlabel MVC is convex, and under mild conditions, the optimization involved in soft-label MVC is akin to a convex one in terms of the resulting clusters. Secondly, the soft-label MVC method pos∗. A preliminary and shorter version has appeared in Proceedings of 14th International Conference on Artificial Intelligence and Statistics (Niu et al., 2011). The preliminary work was done when GN was studying at Department of Computer Science and Technology, Nanjing University, and BD was studying at Institute of Automation, Chinese Academy of Sciences. A Matlab implementation of maximum volume clustering is available from http://sugiyama-www.cs.titech.ac.jp/∼gang/software.html. c 2013 Gang Niu, Bo Dai, Lin Shang and Masashi Sugiyama. N IU , DAI , S HANG AND S UGIYAMA sesses a clustering error bound. Thirdly, MVC includes the optimization problems of a spectral clustering, two relaxed k-means clustering and an information-maximization clustering as special limit cases when its regularization parameter goes to infinity. Experiments on several artificial and benchmark data sets demonstrate that the proposed MVC compares favorably with state-of-the-art clustering methods. Keywords: discriminative clustering, large volume principle, sequential quadratic programming, semi-definite programming, finite sample stability, clustering error

same-paper 4 0.89065737 77 jmlr-2013-On the Convergence of Maximum Variance Unfolding

Author: Ery Arias-Castro, Bruno Pelletier

Abstract: Maximum Variance Unfolding is one of the main methods for (nonlinear) dimensionality reduction. We study its large sample limit, providing specific rates of convergence under standard assumptions. We find that it is consistent when the underlying submanifold is isometric to a convex subset, and we provide some simple examples where it fails to be consistent. Keywords: maximum variance unfolding, isometric embedding, U-processes, empirical processes, proximity graphs.

5 0.68951899 86 jmlr-2013-Parallel Vector Field Embedding

Author: Binbin Lin, Xiaofei He, Chiyuan Zhang, Ming Ji

Abstract: We propose a novel local isometry based dimensionality reduction method from the perspective of vector fields, which is called parallel vector field embedding (PFE). We first give a discussion on local isometry and global isometry to show the intrinsic connection between parallel vector fields and isometry. The problem of finding an isometry turns out to be equivalent to finding orthonormal parallel vector fields on the data manifold. Therefore, we first find orthonormal parallel vector fields by solving a variational problem on the manifold. Then each embedding function can be obtained by requiring its gradient field to be as close to the corresponding parallel vector field as possible. Theoretical results show that our method can precisely recover the manifold if it is isometric to a connected open subset of Euclidean space. Both synthetic and real data examples demonstrate the effectiveness of our method even if there is heavy noise and high curvature. Keywords: manifold learning, isometry, vector field, covariant derivative, out-of-sample extension

6 0.68787313 28 jmlr-2013-Construction of Approximation Spaces for Reinforcement Learning

7 0.67696083 27 jmlr-2013-Consistent Selection of Tuning Parameters via Variable Selection Stability

8 0.67093033 50 jmlr-2013-Greedy Feature Selection for Subspace Clustering

9 0.66649634 100 jmlr-2013-Similarity-based Clustering by Left-Stochastic Matrix Factorization

10 0.65876365 69 jmlr-2013-Manifold Regularization and Semi-supervised Learning: Some Theoretical Analyses

11 0.65742159 59 jmlr-2013-Large-scale SVD and Manifold Learning

12 0.65508044 46 jmlr-2013-GURLS: A Least Squares Library for Supervised Learning

13 0.65301865 37 jmlr-2013-Divvy: Fast and Intuitive Exploratory Data Analysis

14 0.65005982 29 jmlr-2013-Convex and Scalable Weakly Labeled SVMs

15 0.6495443 42 jmlr-2013-Fast Generalized Subset Scan for Anomalous Pattern Detection

16 0.6451962 5 jmlr-2013-A Near-Optimal Algorithm for Differentially-Private Principal Components

17 0.64513665 2 jmlr-2013-A Binary-Classification-Based Metric between Time-Series Distributions and Its Use in Statistical and Learning Problems

18 0.64497679 41 jmlr-2013-Experiment Selection for Causal Discovery

19 0.63729465 52 jmlr-2013-How to Solve Classification and Regression Problems on High-Dimensional Data with a Supervised Extension of Slow Feature Analysis

20 0.63586122 99 jmlr-2013-Semi-Supervised Learning Using Greedy Max-Cut