jmlr jmlr2012 jmlr2012-68 knowledge-graph by maker-knowledge-mining

68 jmlr-2012-Minimax Manifold Estimation

Source: pdf

Author: Christopher Genovese, Marco Perone-Pacifico, Isabella Verdinelli, Larry Wasserman

Abstract: We ﬁnd the minimax rate of convergence in Hausdorff distance for estimating a manifold M of dimension d embedded in RD given a noisy sample from the manifold. Under certain conditions, we show that the optimal rate of convergence is n−2/(2+d) . Thus, the minimax rate depends only on the dimension of the manifold, not on the dimension of the space in which M is embedded. Keywords: manifold learning, minimax estimation

Reference: text

Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 Pittsburgh, PA 15213, USA Editor: Ulrike von Luxburg Abstract We ﬁnd the minimax rate of convergence in Hausdorff distance for estimating a manifold M of dimension d embedded in RD given a noisy sample from the manifold. [sent-12, score-0.332]

2 Introduction We consider the problem of estimating a manifold M given noisy observations near the manifold. [sent-16, score-0.235]

3 , ξn are unobserved variables drawn from a distribution supported on a manifold M with dimension d < D. [sent-24, score-0.235]

4 A manifold M and a distribution for (ξ, Z) induce a distribution Q ≡ QM for Y . [sent-31, score-0.235]

5 Given two sets A and B, the Hausdorff distance between A and B is H(A, B) = inf ε : A ⊂ B ⊕ ε and B ⊂ A⊕ε where A⊕ε = BD (x, ε) x∈A and BD (x, ε) is an open ball in RD centered at x with radius ε. [sent-39, score-0.188]

6 We are interested in the minimax risk Rn (Q ) = inf sup EQ [H(M, M)] M Q∈Q where the inﬁmum is over all estimators M. [sent-40, score-0.14]

7 Theorem 2 Under assumptions (A1)-(A4) given in Section 2, there exists an estimator M such that, for all large n, sup EQ H(M, M) ≤ C2 Q∈Q log n n 2 2+d for some C2 > 0. [sent-51, score-0.141]

8 But, the estimator constructed in the proof of that theorem is not practical and so in Section 5, we construct a very simple estimator M such that C log n 1/D . [sent-54, score-0.219]

9 sup EQ H(M, M) ≤ n Q∈Q This is slower than the minimax rate, but the estimator is computationally very simple and requires no knowledge of d or the smoothness of M. [sent-55, score-0.198]

10 1 Related Work There is a vast literature on manifold estimation. [sent-57, score-0.235]

11 We are interested instead in actually estimating the manifold itself. [sent-60, score-0.235]

12 In the literature on computational geometry, observations are called noisy if they depart from the underlying manifold in a very speciﬁc way: the observations have to be close to the manifold but not too close to each other. [sent-63, score-0.47]

13 We let BD (x, r) denote a D-dimensional open ball centered at x with radius r. [sent-74, score-0.193]

14 The uniform measure on a manifold M is denoted by µM . [sent-77, score-0.235]

15 (2) G ENOVESE , P ERONE -PACIFICO , V ERDINELLI AND WASSERMAN Figure 1: The condition number ∆(M) of a manifold is the largest number κ such that the normals to the manifold do not cross as long as they are not extended beyond κ. [sent-88, score-0.504]

16 The plot on the left shows a one-dimensional manifold (a curve) and some normals of length r < κ. [sent-89, score-0.269]

17 The plot on the right shows the same manifold and some normals of length r > κ. [sent-90, score-0.269]

18 Model Assumptions In this section we describe all the assumptions on the manifold and on the underlying distributions. [sent-97, score-0.235]

19 At each u ∈ M let Tu M denote the tangent space to M and let Tu⊥ M be the normal space. [sent-102, score-0.244]

20 Note that if M is a sphere then ∆(M) is just the radius of the sphere and if M is a linear space then ∆(M) = ∞. [sent-114, score-0.208]

21 The angle between two tangent spaces Tp and Tq is deﬁned to be angle(Tp , Tq ) = cos−1 min max | u − p, v − q | u∈Tp v∈Tq where u, v is the usual inner product in RD . [sent-119, score-0.247]

22 Lemma 3 Let M ⊂ K be a manifold and suppose that ∆(M) = κ > 0. [sent-123, score-0.235]

23 Any smooth density bounded away from 0 on the manifold will lead to similar results. [sent-152, score-0.318]

24 Recall that the essential supremum and essential inﬁmum of qM are deﬁned by ess sup qM = inf a ∈ R : νD ({y : qM (y) > a} ∩ A) = 0 y∈A 1267 G ENOVESE , P ERONE -PACIFICO , V ERDINELLI AND WASSERMAN and ess inf qM = sup a ∈ R : νD ({y : qM (y) < a} ∩ A) = 0 . [sent-166, score-0.256]

25 Lemma 4 There exist constants 0 < C∗ ≤ C∗ < ∞, depending only on κ and d, such that C∗ ≤ inf ess inf M∈M y∈SM qM (y) qM (y) ≤ sup ess sup ≤ C∗ . [sent-170, score-0.256]

26 Consider the two spheres of radius κ tangent to M at y in the direction of the line between x and y. [sent-177, score-0.22]

27 To see this, note that because M is a manifold and ∆(M) ≥ κ, it follows that near y, M may be locally parameterized as a smooth function f = ( f1 , . [sent-185, score-0.287]

28 The two spheres are tangent to M at y and have radius κ. [sent-198, score-0.22]

29 The proof of the lower bound is similar to the upper bound except for the following changes: let U0 denote all u ∈ U such that the radius of B ∩ Lσ (u) is at least ε/2. [sent-202, score-0.194]

30 (2006), d/2 r2 ε2 rd εd ωd Vol(BD (y, r) ∩ M) ≥ 1 − 2 4κ and the latter is larger than 2−d/2 rd εd ωd for all small ε. [sent-206, score-0.16]

31 We conclude this section by recording all the assumptions in Theorems 1 and 2: (A1) The manifold M is d-dimensional and is contained in a compact set K ⊂ RD with d < D. [sent-209, score-0.235]

32 This creates a new ′ ′ ′ manifold M0 such that H(M0 , M0 ) = γ. [sent-260, score-0.235]

33 We will roll a sphere of radius ′ to get a smooth manifold M as in Figure 3. [sent-262, score-0.432]

34 1270 M INIMAX M ANIFOLD E STIMATION A B C D Figure 3: A sphere of radius κ is pushed upwards into the plane M0 (panel A). [sent-277, score-0.176]

35 A sphere is then rolled around the manifold (panel C) to produce a smooth manifold M1 (panel D). [sent-279, score-0.585]

36 P∈P ∗ The sequence {Sn } in Theorem 7 is called a sieve and the estimator p∗ is called a sieve-maximum likelihood estimator. [sent-319, score-0.216]

37 P∈P 1272 M INIMAX M ANIFOLD E STIMATION Proof By the triangle inequality, h(p, p) ≤ h(p, p∗ ) + h( p, p∗ ) = h(p, p∗ ) + h( p, ut / ut ) where p∗ = ut / ut for some t. [sent-329, score-0.156]

38 In light of the above result, we deﬁne modiﬁed maximum likelihood sieve estimator p to be any p ∈ P such that ℓ ≤ p ≤ u. [sent-340, score-0.216]

39 For simplicity, in the rest of the paper, we refer to the modiﬁed sieve estimator p, simply as the maximum likelihood estimator (mle). [sent-341, score-0.31]

40 Establish the rate of convergence of the mle in Hellinger distance, using the bracketing entropy and Theorem 7. [sent-355, score-0.158]

41 Relate the Hausdorff distance to the Hellinger distance and hence establish the rate of convergence an of the mle in Hausdorff distance. [sent-357, score-0.174]

42 Conclude that the true manifold is contained, with high probability, in Mn = {M ∈ M (κ) : H(M, M) ≤ an } (Lemma 14). [sent-360, score-0.235]

43 To improve the pilot estimator, we need to control the relationship between Hellinger and Hausdorff distance and thus need to work over small sets on which the manifold cannot vary too greatly. [sent-363, score-0.354]

44 Hence, we cover the pilot estimator with long, thin slabs R1 , . [sent-364, score-0.304]

45 We deﬁne a slab R j to be the union of ﬁbers of size b = σ + an within one of the spheres: R j = ∪x∈‫ ג‬j Lb (x, M). [sent-372, score-0.172]

46 The entropy of the set of distributions within a slab is very small. [sent-385, score-0.204]

47 Because the entropy is small, the maximum likelihood estimator within a slab converges fairly quickly in Hellinger distance. [sent-388, score-0.328]

48 The reason for getting a preliminary estimator and then covering the estimator with thin slabs is that, within a slab, there is a tight relationship between Hellinger distance and Hausdorff distance. [sent-399, score-0.361]

49 5 Step 2a: Computing the Entropy of Q To compute the entropy of Q we start by constructing a ﬁnite net of manifolds to cover M (κ). [sent-418, score-0.23]

50 Our strategy is to show that within each of these cubes, the manifold is the graph of a smooth function. [sent-431, score-0.287]

51 In thinking about the manifold as (locally) the graph of a smooth function, it helps to be able to translate easily between the natural coordinates in K and the domain-range coordinates of the function. [sent-433, score-0.375]

52 ” Each frame is associated with a relabeling of the coordinates so that the d “domain” coordinates are listed ﬁrst and D − d “range” coordinates last. [sent-441, score-0.132]

53 That is, Fjk is deﬁned by a one-to-one correspondence between x ∈ C j and (u, v) ∈ π jk (x) where u ∈ Rd and v ∈ RD−d and π jk (x1 , . [sent-442, score-0.31]

54 We deﬁne domain(Fjk ) = {u ∈ Rd : ∃v ∈ RD−d such that (u, v) ∈ Fjk }, and let G jk denote the class of functions deﬁned on domain(Fjk ) whose second derivative (i. [sent-458, score-0.222]

55 jk We will prove the theorem by establishing the following claims. [sent-462, score-0.141]

56 , ud ) → π−1 ((u, f (u))) for some function f on A , and jk (ii) this function lies in G jk . [sent-473, score-0.282]

57 M is in one-to-one correspondence with a subset of G = ∏J j=1 k=1 G jk . [sent-475, score-0.169]

58 This implies that the function identiﬁed in (i) has uniformly bounded second derivative and thus lies in the corresponding G jk . [sent-485, score-0.141]

59 Then in each domain(Fjk ), there is a point u such that C j ∩ π−1 (u × RD−d ) jk √ intersects M in at least two points, call them ak and bk . [sent-492, score-0.239]

60 By construction ak − bk ≤ D − d · κ/c, and hence by choosing c large enough (making the cubes small), part 3 of Lemma 3 tells us that √ dM (ak , bk ) ≤ 2 D − dκ/c. [sent-493, score-0.143]

61 max cos(angle(Tp M, Tq M)) ≥ 1 − p,q∈C j ∩M c For large enough c, the maximum angle between tangent vectors can be made smaller than π/3. [sent-496, score-0.247]

62 By part 2 of Lemma 3, any point z along a geodesic between ak and bk , √ 2 D−d cos(angle(Tak M, Tz M)) ≥ 1 − . [sent-498, score-0.144]

63 c It follows that there is a point in C j ∩ M and a tangent vector vk at that point such that √ angle(vk , bk − ak ) = O(1/ c). [sent-499, score-0.145]

64 For each cube C j , let k∗ be j the smallest k such that M ∩ C j is the graph of a function φ jk ∈ G jk as in Claim 1. [sent-509, score-0.363]

65 Part 1 of Lemma 3 shows that each M ∈ M has curvature (second fundamental form) bounded above by 1/κ, so each G jk satisﬁes Birman and Solomjak’s conditions. [sent-519, score-0.184]

66 Because all the G jk ’s are disjoint, simple counting arguments show that N(γ, G , L∞ ) = J N(γ, G jk , L∞ ) , where J is the number of cubes deﬁned above. [sent-521, score-0.282]

67 But because all such functions have an extension in G jk , a covering of G jk also covers these functions deﬁned on restricted domains. [sent-524, score-0.323]

68 , K}, let G jk be a minimal L∞ cover of G jk by γ/2 balls; speciﬁcally, we assume γ γ that G jk is the set of centers of these balls. [sent-532, score-0.543]

69 For each g jk ∈ G jk , deﬁne f jk (u) = π−1 (u, g jk (u)). [sent-533, score-0.564]

70 jk For every j, choose one such f jk , and deﬁne a set M ′ = j (C j ∩ range( f jk j )), which is a union of manifolds with boundary that have curvature bounded by 1/κ. [sent-534, score-0.605]

71 By construction, B is tangent to ∂S1 at x′ and tangent to ∂S2 at y′ , and B contains ball has radius γ/2 = (1/2) min{κ − σ, b} < κ − σ. [sent-618, score-0.276]

72 We conclude that, with high probability, the true manifold M is contained in the set Mn = M ∈ M (κ) : H(M, M) ≤ an . [sent-632, score-0.235]

73 9 Step 3: Cover With Slabs Now we cover the pilot estimator M with (possibly overlapping) slabs. [sent-634, score-0.212]

74 Let x and x be as given in the statement of the lemma and let θ = angle(Tx M, Tx M). [sent-657, score-0.158]

75 Hence, u and v each make an angle greater than π/8 with respect to the x-axis. [sent-665, score-0.165]

76 Consider the two circles C1 and C2 tangent to M at x with radius κ where C1 lies below v and C2 lies above v. [sent-666, score-0.164]

77 Note that angle(z − x, u) is larger than the angle between q − x and the x-axis which is arctan √C1 −1 ≡ α > 0. [sent-677, score-0.165]

78 Because x2 − x1 lies in Lb (x) and is consequently orthogonal to Tx M, there must exist a point on the geodesic whose angle with Tx M equals π/2, contradicting part 1. [sent-707, score-0.246]

79 Thus the slabs cover M and M cuts across R j is a function-like way. [sent-732, score-0.131]

80 n M∈Mn Proof By Lemma 16, M ∩ R j lies in a slab of size an orthogonal to ‫ ג‬j . [sent-751, score-0.172]

81 Because the angle between the two manifolds on this set must be no more than π/4 and because an > δn , the manifold M cannot intersect both the “top” and “bottom” surfaces of the slab. [sent-752, score-0.531]

82 Let yi be the closest point in the net to y and let θ j be the closest angle in the net to angle(Ty M, Tx M). [sent-772, score-0.426]

83 Because the angle between M and Mi j is strictly less than π/4 (part 1 of Lemma 15) and the slab R j has radius δn , it follows that H(M, Mi j ) ≤ C1 γ + δnC2 γ ≤ Cγ. [sent-773, score-0.419]

84 For each Mi j ∈ Net(γ) let qi j be the corresponding density and deﬁne ui j and ℓi j by ui j (y) = qi j (y) + Cε2 I(y ∈ Mi j ⊕ (σ + ε2 )) V (Mi j ⊕ (σ + ε2 )) ℓi j (y) = qi j (y) − Cε2 I(y ∈ M j ⊕ (σ − ε2 )). [sent-776, score-0.312]

85 Let M ∈ Mn and let Mi j be the element of the net closest to M. [sent-778, score-0.171]

86 Let M be the manifold corresponding to q and let M j = M ∩ R j. [sent-788, score-0.316]

87 1283 G ENOVESE , P ERONE -PACIFICO , V ERDINELLI AND WASSERMAN Since H[ ] (ε, Qn, j , h) ≤ log(C(1/ε)), solving the equation H[ ] (ε, Qn, j , h) = mn ε2 we get εm ≥ C log mn /mn = (log n/n)2/(2(2+d)) = δn . [sent-801, score-0.228]

88 This follows since g1 and g2 are smooth, they both lie in a slab of size an n around ‫ ג‬j and the angle between the tangent of g j (x) and ‫ ג‬j is bounded by π/4. [sent-809, score-0.419]

89 ′ ′ Create a modiﬁed manifold M2 such that M2 differs from M1 over ‫ ′ג‬by a γ/2 shift orthogonal ′ ′ to ‫ ג‬j and such that M2 is otherwise equal to M1 . [sent-810, score-0.235]

90 Let y be a point in S and let Λ(y) ≤ σ be its distance from the boundary ∂S. [sent-849, score-0.16]

91 Let v be a point on the manifold closest to y and let y∗ be the point on the segment joining y to v such that ||y − y∗ || = ε/2. [sent-852, score-0.347]

92 Let Oy be the ball of radius ∆ − σ tangent to y such that Oy ⊂ Sc . [sent-858, score-0.194]

93 Conclusion and Open Questions We have established that the optimal rate for estimating a smooth manifold in Hausdorff distance is 2 n− 2+d . [sent-880, score-0.327]

94 We also allow the distribution along the manifold to be any smooth density bounded away from 0. [sent-885, score-0.318]

95 The reason is that manifold estimation is similar to (and in fact, more difﬁcult than) nonparametric regression with measurement error. [sent-898, score-0.235]

96 Now deﬁne the following d-dimensional manifold in RD M0 = = (u, v, 0D−d−1 ) : (u, v) ∈ F0 (u, a(u), 0D−d−1 ) : u ∈ Bd (0, 1 + κ) ∪ (u, −a(u), 0D−d−1 ) : u ∈ Bd (0, 1 + κ) where a(u) = κ if ||u|| ≤ 1 if 1 < ||u|| ≤ 1 + κ. [sent-917, score-0.235]

97 κ2 − (||u|| − 1)2 The manifold M0 has no boundary and, by construction, ∆(M0 ) ≥ κ. [sent-918, score-0.274]

98 Now deﬁne a second manifold that coincides with M0 but has a small perturbation. [sent-919, score-0.235]

99 Note that ∆(M1 ) ≥ κ since the perturbation is obtained using portions of spheres of radius κ. [sent-921, score-0.138]

100 Smooth manifold reconstruction from noisy and non-uniform approximation with guarantees. [sent-966, score-0.235]

similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('bd', 0.388), ('qm', 0.244), ('manifold', 0.235), ('hausdorff', 0.216), ('hellinger', 0.211), ('erdinelli', 0.185), ('erone', 0.185), ('slab', 0.172), ('qn', 0.171), ('angle', 0.165), ('lb', 0.159), ('enovese', 0.158), ('inimax', 0.158), ('anifold', 0.142), ('jk', 0.141), ('fjk', 0.132), ('wasserman', 0.117), ('stimation', 0.116), ('mn', 0.114), ('um', 0.103), ('tx', 0.101), ('manifolds', 0.1), ('estimator', 0.094), ('sieve', 0.092), ('slabs', 0.092), ('vol', 0.083), ('tangent', 0.082), ('radius', 0.082), ('geodesic', 0.081), ('let', 0.081), ('rd', 0.08), ('pilot', 0.079), ('ty', 0.079), ('bracketing', 0.079), ('niyogi', 0.079), ('lemma', 0.077), ('tube', 0.071), ('bers', 0.068), ('eq', 0.066), ('tp', 0.066), ('dey', 0.066), ('sphere', 0.063), ('tq', 0.061), ('net', 0.059), ('minimax', 0.057), ('genovese', 0.056), ('spheres', 0.056), ('claim', 0.053), ('smooth', 0.052), ('mi', 0.048), ('sup', 0.047), ('mle', 0.047), ('hence', 0.047), ('qi', 0.046), ('cuevas', 0.045), ('ess', 0.045), ('rodr', 0.045), ('coordinates', 0.044), ('dm', 0.043), ('curvature', 0.043), ('lebesgue', 0.043), ('tu', 0.041), ('perpendicular', 0.041), ('covering', 0.041), ('distance', 0.04), ('expandable', 0.04), ('hess', 0.04), ('isabella', 0.04), ('verdinelli', 0.04), ('ut', 0.039), ('boundary', 0.039), ('cover', 0.039), ('ber', 0.038), ('sm', 0.037), ('geometry', 0.036), ('inf', 0.036), ('intersects', 0.035), ('partly', 0.034), ('marco', 0.034), ('normals', 0.034), ('cam', 0.033), ('bk', 0.033), ('pn', 0.032), ('entropy', 0.032), ('panel', 0.031), ('density', 0.031), ('closest', 0.031), ('ui', 0.031), ('proof', 0.031), ('plane', 0.031), ('intersect', 0.031), ('submanifolds', 0.031), ('birman', 0.031), ('likelihood', 0.03), ('ball', 0.03), ('ak', 0.03), ('na', 0.029), ('noise', 0.028), ('correspondence', 0.028), ('cn', 0.028)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.99999923 68 jmlr-2012-Minimax Manifold Estimation

Author: Christopher Genovese, Marco Perone-Pacifico, Isabella Verdinelli, Larry Wasserman

2 0.13118693 50 jmlr-2012-Human Gesture Recognition on Product Manifolds

Author: Yui Man Lui

Abstract: Action videos are multidimensional data and can be naturally represented as data tensors. While tensor computing is widely used in computer vision, the geometry of tensor space is often ignored. The aim of this paper is to demonstrate the importance of the intrinsic geometry of tensor space which yields a very discriminating structure for action recognition. We characterize data tensors as points on a product manifold and model it statistically using least squares regression. To this aim, we factorize a data tensor relating to each order of the tensor using Higher Order Singular Value Decomposition (HOSVD) and then impose each factorized element on a Grassmann manifold. Furthermore, we account for underlying geometry on manifolds and formulate least squares regression as a composite function. This gives a natural extension from Euclidean space to manifolds. Consequently, classiﬁcation is performed using geodesic distance on a product manifold where each factor manifold is Grassmannian. Our method exploits appearance and motion without explicitly modeling the shapes and dynamics. We assess the proposed method using three gesture databases, namely the Cambridge hand-gesture, the UMD Keck body-gesture, and the CHALEARN gesture challenge data sets. Experimental results reveal that not only does the proposed method perform well on the standard benchmark data sets, but also it generalizes well on the one-shot-learning gesture challenge. Furthermore, it is based on a simple statistical model and the intrinsic geometry of tensor space. Keywords: gesture recognition, action recognition, Grassmann manifolds, product manifolds, one-shot-learning, kinect data

3 0.11189838 67 jmlr-2012-Minimax-Optimal Rates For Sparse Additive Models Over Kernel Classes Via Convex Programming

Author: Garvesh Raskutti, Martin J. Wainwright, Bin Yu

Abstract: Sparse additive models are families of d-variate functions with the additive decomposition f ∗ = ∑ j∈S f j∗ , where S is an unknown subset of cardinality s ≪ d. In this paper, we consider the case where each univariate component function f j∗ lies in a reproducing kernel Hilbert space (RKHS), and analyze a method for estimating the unknown function f ∗ based on kernels combined with ℓ1 -type convex regularization. Working within a high-dimensional framework that allows both the dimension d and sparsity s to increase with n, we derive convergence rates in the L2 (P) and L2 (Pn ) norms over the class Fd,s,H of sparse additive models with each univariate function f j∗ in the unit ball of a univariate RKHS with bounded kernel function. We complement our upper bounds by deriving minimax lower bounds on the L2 (P) error, thereby showing the optimality of our method. Thus, we obtain optimal minimax rates for many interesting classes of sparse additive models, including polynomials, splines, and Sobolev classes. We also show that if, in contrast to our univariate conditions, the d-variate function class is assumed to be globally bounded, then much √ faster estimation rates are possible for any sparsity s = Ω( n), showing that global boundedness is a signiﬁcant restriction in the high-dimensional setting. Keywords: sparsity, kernel, non-parametric, convex, minimax

4 0.10317044 83 jmlr-2012-Online Learning in the Embedded Manifold of Low-rank Matrices

Author: Uri Shalit, Daphna Weinshall, Gal Chechik

Abstract: When learning models that are represented in matrix forms, enforcing a low-rank constraint can dramatically improve the memory and run time complexity, while providing a natural regularization of the model. However, naive approaches to minimizing functions over the set of low-rank matrices are either prohibitively time consuming (repeated singular value decomposition of the matrix) or numerically unstable (optimizing a factored representation of the low-rank matrix). We build on recent advances in optimization over manifolds, and describe an iterative online learning procedure, consisting of a gradient step, followed by a second-order retraction back to the manifold. While the ideal retraction is costly to compute, and so is the projection operator that approximates it, we describe another retraction that can be computed efﬁciently. It has run time and memory complexity of O ((n + m)k) for a rank-k matrix of dimension m × n, when using an online procedure with rank-one gradients. We use this algorithm, L ORETA, to learn a matrix-form similarity measure over pairs of documents represented as high dimensional vectors. L ORETA improves the mean average precision over a passive-aggressive approach in a factorized model, and also improves over a full model trained on pre-selected features using the same memory requirements. We further adapt L ORETA to learn positive semi-deﬁnite low-rank matrices, providing an online algorithm for low-rank metric learning. L ORETA also shows consistent improvement over standard weakly supervised methods in a large (1600 classes and 1 million images, using ImageNet) multi-label image classiﬁcation task. Keywords: low rank, Riemannian manifolds, metric learning, retractions, multitask learning, online learning

5 0.089523792 64 jmlr-2012-Manifold Identification in Dual Averaging for Regularized Stochastic Online Learning

Author: Sangkyun Lee, Stephen J. Wright

Abstract: Iterative methods that calculate their steps from approximate subgradient directions have proved to be useful for stochastic learning problems over large and streaming data sets. When the objective consists of a loss function plus a nonsmooth regularization term, the solution often lies on a lowdimensional manifold of parameter space along which the regularizer is smooth. (When an ℓ1 regularizer is used to induce sparsity in the solution, for example, this manifold is deﬁned by the set of nonzero components of the parameter vector.) This paper shows that a regularized dual averaging algorithm can identify this manifold, with high probability, before reaching the solution. This observation motivates an algorithmic strategy in which, once an iterate is suspected of lying on an optimal or near-optimal manifold, we switch to a “local phase” that searches in this manifold, thus converging rapidly to a near-optimal point. Computational results are presented to verify the identiﬁcation property and to illustrate the effectiveness of this approach. Keywords: regularization, dual averaging, partly smooth manifold, manifold identiﬁcation

6 0.085463397 56 jmlr-2012-Learning Linear Cyclic Causal Models with Latent Variables

7 0.082958445 80 jmlr-2012-On Ranking and Generalization Bounds

8 0.081985265 2 jmlr-2012-A Comparison of the Lasso and Marginal Regression

9 0.080219612 91 jmlr-2012-Plug-in Approach to Active Learning

10 0.077379242 29 jmlr-2012-Consistent Model Selection Criteria on High Dimensions

11 0.06423644 15 jmlr-2012-Algebraic Geometric Comparison of Probability Distributions

12 0.053381942 17 jmlr-2012-An Active Learning Algorithm for Ranking from Pairwise Preferences with an Almost Optimal Query Complexity

13 0.052671857 99 jmlr-2012-Restricted Strong Convexity and Weighted Matrix Completion: Optimal Bounds with Noise

14 0.047275957 76 jmlr-2012-Noise-Contrastive Estimation of Unnormalized Statistical Models, with Applications to Natural Image Statistics

15 0.046602231 20 jmlr-2012-Analysis of a Random Forests Model

16 0.046566557 3 jmlr-2012-A Geometric Approach to Sample Compression

17 0.045930266 13 jmlr-2012-Active Learning via Perfect Selective Classification

18 0.043746199 109 jmlr-2012-Stability of Density-Based Clustering

19 0.042278461 14 jmlr-2012-Activized Learning: Transforming Passive to Active with Improved Label Complexity

20 0.040426087 51 jmlr-2012-Integrating a Partial Model into Model Free Reinforcement Learning

similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, -0.206), (1, 0.1), (2, -0.095), (3, -0.051), (4, -0.024), (5, -0.13), (6, 0.01), (7, -0.126), (8, 0.185), (9, -0.175), (10, 0.107), (11, 0.073), (12, -0.159), (13, 0.101), (14, -0.02), (15, 0.084), (16, -0.187), (17, -0.093), (18, -0.066), (19, -0.224), (20, 0.245), (21, 0.039), (22, 0.104), (23, 0.185), (24, -0.023), (25, -0.022), (26, -0.036), (27, 0.007), (28, 0.053), (29, -0.023), (30, -0.072), (31, -0.064), (32, 0.062), (33, -0.056), (34, 0.074), (35, -0.102), (36, -0.058), (37, -0.114), (38, -0.085), (39, 0.124), (40, 0.052), (41, 0.018), (42, 0.044), (43, 0.1), (44, -0.025), (45, 0.058), (46, 0.022), (47, 0.055), (48, -0.061), (49, 0.004)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.93585622 68 jmlr-2012-Minimax Manifold Estimation

Author: Christopher Genovese, Marco Perone-Pacifico, Isabella Verdinelli, Larry Wasserman

2 0.54408979 83 jmlr-2012-Online Learning in the Embedded Manifold of Low-rank Matrices

Author: Uri Shalit, Daphna Weinshall, Gal Chechik

3 0.52602786 50 jmlr-2012-Human Gesture Recognition on Product Manifolds

Author: Yui Man Lui

4 0.43270472 64 jmlr-2012-Manifold Identification in Dual Averaging for Regularized Stochastic Online Learning

Author: Sangkyun Lee, Stephen J. Wright

5 0.3934575 15 jmlr-2012-Algebraic Geometric Comparison of Probability Distributions

Author: Franz J. Király, Paul von Bünau, Frank C. Meinecke, Duncan A.J. Blythe, Klaus-Robert Müller

Abstract: We propose a novel algebraic algorithmic framework for dealing with probability distributions represented by their cumulants such as the mean and covariance matrix. As an example, we consider the unsupervised learning problem of ﬁnding the subspace on which several probability distributions agree. Instead of minimizing an objective function involving the estimated cumulants, we show that by treating the cumulants as elements of the polynomial ring we can directly solve the problem, at a lower computational cost and with higher accuracy. Moreover, the algebraic viewpoint on probability distributions allows us to invoke the theory of algebraic geometry, which we demonstrate in a compact proof for an identiﬁability criterion. Keywords: computational algebraic geometry, approximate algebra, unsupervised Learning

6 0.38401788 56 jmlr-2012-Learning Linear Cyclic Causal Models with Latent Variables

7 0.35768405 67 jmlr-2012-Minimax-Optimal Rates For Sparse Additive Models Over Kernel Classes Via Convex Programming

8 0.33154535 2 jmlr-2012-A Comparison of the Lasso and Marginal Regression

9 0.32037547 91 jmlr-2012-Plug-in Approach to Active Learning

10 0.31341919 29 jmlr-2012-Consistent Model Selection Criteria on High Dimensions

11 0.30758637 3 jmlr-2012-A Geometric Approach to Sample Compression

12 0.29053199 17 jmlr-2012-An Active Learning Algorithm for Ranking from Pairwise Preferences with an Almost Optimal Query Complexity

13 0.27038577 80 jmlr-2012-On Ranking and Generalization Bounds

14 0.24511561 59 jmlr-2012-Linear Regression With Random Projections

15 0.23250826 20 jmlr-2012-Analysis of a Random Forests Model

16 0.23037826 109 jmlr-2012-Stability of Density-Based Clustering

17 0.21476707 99 jmlr-2012-Restricted Strong Convexity and Weighted Matrix Completion: Optimal Bounds with Noise

18 0.19691667 69 jmlr-2012-Mixability is Bayes Risk Curvature Relative to Log Loss

19 0.19627917 39 jmlr-2012-Estimation and Selection via Absolute Penalized Convex Minimization And Its Multistage Adaptive Applications

20 0.18491197 111 jmlr-2012-Structured Sparsity and Generalization

similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(21, 0.038), (26, 0.045), (27, 0.013), (29, 0.045), (35, 0.028), (41, 0.398), (49, 0.018), (69, 0.035), (75, 0.036), (77, 0.023), (79, 0.015), (81, 0.015), (92, 0.116), (96, 0.085)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.68720752 68 jmlr-2012-Minimax Manifold Estimation

Author: Christopher Genovese, Marco Perone-Pacifico, Isabella Verdinelli, Larry Wasserman

2 0.37264228 64 jmlr-2012-Manifold Identification in Dual Averaging for Regularized Stochastic Online Learning

Author: Sangkyun Lee, Stephen J. Wright

3 0.37019628 73 jmlr-2012-Multi-task Regression using Minimal Penalties

Author: Matthieu Solnon, Sylvain Arlot, Francis Bach

Abstract: In this paper we study the kernel multiple ridge regression framework, which we refer to as multitask regression, using penalization techniques. The theoretical analysis of this problem shows that the key element appearing for an optimal calibration is the covariance matrix of the noise between the different tasks. We present a new algorithm to estimate this covariance matrix, based on the concept of minimal penalty, which was previously used in the single-task regression framework to estimate the variance of the noise. We show, in a non-asymptotic setting and under mild assumptions on the target function, that this estimator converges towards the covariance matrix. Then plugging this estimator into the corresponding ideal penalty leads to an oracle inequality. We illustrate the behavior of our algorithm on synthetic examples. Keywords: multi-task, oracle inequality, learning theory

4 0.36842498 82 jmlr-2012-On the Necessity of Irrelevant Variables

Author: David P. Helmbold, Philip M. Long

Abstract: This work explores the effects of relevant and irrelevant boolean variables on the accuracy of classiﬁers. The analysis uses the assumption that the variables are conditionally independent given the class, and focuses on a natural family of learning algorithms for such sources when the relevant variables have a small advantage over random guessing. The main result is that algorithms relying predominately on irrelevant variables have error probabilities that quickly go to 0 in situations where algorithms that limit the use of irrelevant variables have errors bounded below by a positive constant. We also show that accurate learning is possible even when there are so few examples that one cannot determine with high conﬁdence whether or not any individual variable is relevant. Keywords: feature selection, generalization, learning theory

5 0.36819637 8 jmlr-2012-A Primal-Dual Convergence Analysis of Boosting

Author: Matus Telgarsky

Abstract: Boosting combines weak learners into a predictor with low empirical risk. Its dual constructs a high entropy distribution upon which weak learners and training labels are uncorrelated. This manuscript studies this primal-dual relationship under a broad family of losses, including the exponential loss of AdaBoost and the logistic loss, revealing: • Weak learnability aids the whole loss family: for any ε > 0, O (ln(1/ε)) iterations sufﬁce to produce a predictor with empirical risk ε-close to the inﬁmum; • The circumstances granting the existence of an empirical risk minimizer may be characterized in terms of the primal and dual problems, yielding a new proof of the known rate O (ln(1/ε)); • Arbitrary instances may be decomposed into the above two, granting rate O (1/ε), with a matching lower bound provided for the logistic loss. Keywords: boosting, convex analysis, weak learnability, coordinate descent, maximum entropy

6 0.36816451 67 jmlr-2012-Minimax-Optimal Rates For Sparse Additive Models Over Kernel Classes Via Convex Programming

7 0.36799261 34 jmlr-2012-Dynamic Policy Programming

8 0.36773074 85 jmlr-2012-Optimal Distributed Online Prediction Using Mini-Batches

9 0.36700574 111 jmlr-2012-Structured Sparsity and Generalization

10 0.36434048 80 jmlr-2012-On Ranking and Generalization Bounds

11 0.36291245 13 jmlr-2012-Active Learning via Perfect Selective Classification

12 0.36232805 11 jmlr-2012-A Unifying Probabilistic Perspective for Spectral Dimensionality Reduction: Insights and New Models

13 0.3611224 117 jmlr-2012-Variable Selection in High-dimensional Varying-coefficient Models with Global Optimality

14 0.36086601 7 jmlr-2012-A Multi-Stage Framework for Dantzig Selector and LASSO

15 0.36081946 26 jmlr-2012-Coherence Functions with Applications in Large-Margin Classification Methods

16 0.35997888 71 jmlr-2012-Multi-Instance Learning with Any Hypothesis Class

17 0.35984367 15 jmlr-2012-Algebraic Geometric Comparison of Probability Distributions

18 0.35963768 2 jmlr-2012-A Comparison of the Lasso and Marginal Regression

19 0.35929528 103 jmlr-2012-Sampling Methods for the Nyström Method

20 0.3570323 118 jmlr-2012-Variational Multinomial Logit Gaussian Process