nips nips2010 nips2010-232 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Hariharan Narayanan, Sanjoy Mitter
Abstract: The hypothesis that high dimensional data tends to lie in the vicinity of a low dimensional manifold is the basis of a collection of methodologies termed Manifold Learning. In this paper, we study statistical aspects of the question of fitting a manifold with a nearly optimal least squared error. Given upper bounds on the dimension, volume, and curvature, we show that Empirical Risk Minimization can produce a nearly optimal manifold using a number of random samples that is independent of the ambient dimension of the space in which data lie. We obtain an upper bound on the required number of samples that depends polynomially on the curvature, exponentially on the intrinsic dimension, and linearly on the intrinsic volume. For constant error, we prove a matching minimax lower bound on the sample complexity that shows that this dependence on intrinsic dimension, volume log 1 and curvature is unavoidable. Whether the known lower bound of O( k + 2 δ ) 2 for the sample complexity of Empirical Risk minimization on k−means applied to data in a unit ball of arbitrary dimension is tight, has been an open question since 1997 [3]. Here is the desired bound on the error and δ is a bound on the probability of failure. We improve the best currently known upper bound [14] of 2 log 1 log4 k log 1 O( k2 + 2 δ ) to O k min k, 2 + 2 δ . Based on these results, we 2 devise a simple algorithm for k−means and another that uses a family of convex programs to fit a piecewise linear curve of a specified length to high dimensional data, where the sample complexity is independent of the ambient dimension. 1
Reference: text
sentIndex sentText sentNum sentScore
1 Sample complexity of testing the manifold hypothesis Hariharan Narayanan∗ Laboratory for Information and Decision Systems EECS, MIT Cambridge, MA 02139 har@mit. [sent-1, score-0.491]
2 edu Abstract The hypothesis that high dimensional data tends to lie in the vicinity of a low dimensional manifold is the basis of a collection of methodologies termed Manifold Learning. [sent-3, score-0.711]
3 In this paper, we study statistical aspects of the question of fitting a manifold with a nearly optimal least squared error. [sent-4, score-0.411]
4 Given upper bounds on the dimension, volume, and curvature, we show that Empirical Risk Minimization can produce a nearly optimal manifold using a number of random samples that is independent of the ambient dimension of the space in which data lie. [sent-5, score-0.778]
5 We obtain an upper bound on the required number of samples that depends polynomially on the curvature, exponentially on the intrinsic dimension, and linearly on the intrinsic volume. [sent-6, score-0.469]
6 For constant error, we prove a matching minimax lower bound on the sample complexity that shows that this dependence on intrinsic dimension, volume log 1 and curvature is unavoidable. [sent-7, score-0.785]
7 Whether the known lower bound of O( k + 2 δ ) 2 for the sample complexity of Empirical Risk minimization on k−means applied to data in a unit ball of arbitrary dimension is tight, has been an open question since 1997 [3]. [sent-8, score-0.674]
8 Here is the desired bound on the error and δ is a bound on the probability of failure. [sent-9, score-0.296]
9 We improve the best currently known upper bound [14] of 2 log 1 log4 k log 1 O( k2 + 2 δ ) to O k min k, 2 + 2 δ . [sent-10, score-0.335]
10 Based on these results, we 2 devise a simple algorithm for k−means and another that uses a family of convex programs to fit a piecewise linear curve of a specified length to high dimensional data, where the sample complexity is independent of the ambient dimension. [sent-11, score-0.591]
11 1 Introduction We are increasingly confronted with very high dimensional data in speech signals, images, geneexpression data, and other sources. [sent-12, score-0.112]
12 Manifold Learning can be loosely defined to be a collection of methodologies that are motivated by the belief that this hypothesis (henceforth called the manifold hypothesis) is true. [sent-13, score-0.44]
13 The sample complexity of classification is known to be independent of the ambient dimension [15] under the manifold hypothesis, (assuming the decision boundary is a manifold as well,) thus obviating the curse of dimensionality. [sent-15, score-1.173]
14 A recent empirical study [6] of a large number of 3 × 3 images, represented as points in R9 revealed that they approximately lie on a two dimensional manifold known as the ∗ Research supported by grant CCF-0836720 1 Figure 1: Fitting a torus to data. [sent-16, score-0.524]
15 On the other hand, knowledge that the manifold hypothesis is false with regard to certain data would give us reason to exercise caution in applying algorithms from manifold learning and would provide an incentive for further study. [sent-18, score-0.732]
16 It is thus of considerable interest to know whether given data lie in the vicinity of a low dimensional manifold. [sent-19, score-0.159]
17 We obtain uniform bounds relating the empirical squared loss and the true squared loss over a class F consisting of manifolds whose dimensions, volumes and curvatures are bounded in Theorems 1 and 2. [sent-22, score-0.466]
18 These bounds imply upper bounds on the sample complexity of Empirical Risk Minimization (ERM) that are independent of the ambient dimension, exponential in the intrinsic dimension, polynomial in the curvature and almost linear in the volume. [sent-23, score-0.828]
19 We improve the best currently known upper bound [14] on the sample complexity of Empirical Risk minimization on k−means applied to data in a unit ball of arbitrary dimen2 log 1 log4 k log 1 sion from O( k2 + 2 δ ) to O k min k, 2 + 2 δ . [sent-27, score-0.691]
20 Whether the known lower 2 log 1 bound of O( k + 2 δ ) is tight, has been an open question since 1997 [3]. [sent-28, score-0.226]
21 Here 2 desired bound on the error and δ is a bound on the probability of failure. [sent-29, score-0.296]
22 is the One technical contribution of this paper is the use of dimensionality reduction via random projections in the proof of Theorem 5 to bound the Fat-Shattering dimension of a function class, elements of which roughly correspond to the squared distance to a low dimensional manifold. [sent-30, score-0.509]
23 The application of the probabilistic method involves a projection onto a low dimensional random subspace. [sent-31, score-0.112]
24 This is then followed by arguments of a combinatorial nature involving the VC dimension of halfspaces, and the Sauer-Shelah Lemma applied with respect to the low dimensional subspace. [sent-32, score-0.254]
25 While random projections have frequently been used in machine learning algorithms, for example in [2, 7], to our knowledge, they have not been used as a tool to bound the complexity of a function class. [sent-33, score-0.287]
26 We illustrate the algorithmic utility of our uniform bound by devising an algorithm for k−means and a convex programming algorithm for fitting a piecewise linear curve of bounded length. [sent-34, score-0.398]
27 For a fixed error threshold and length, the dependence on the ambient dimension is linear, which is optimal since this is the complexity of reading the input. [sent-35, score-0.465]
28 2 Connections and Related work In the context of curves, [10] proposed “Principal Curves”, where it was suggested that a natural curve that may be fit to a probability distribution is one where every point on the curve is the center of mass of all those points to which it is the nearest point. [sent-36, score-0.222]
29 A different definition of a principal curve was proposed by [12], where they attempted to find piecewise linear curves of bounded length which minimize the expected squared distance to a random point from a distribution. [sent-37, score-0.365]
30 This paper studies the decay of the error rate as the number of samples tends to infinity, but does not analyze the dependence of the error rate on the ambient dimension and the bound on the length. [sent-38, score-0.534]
31 We address this in a more general setup in Theorem 4, and obtain sample complexity bounds that are independent of 2 the ambient dimension, and depend linearly on the bound on the length. [sent-39, score-0.585]
32 log 1 It has been an open question since 1997 [3], whether the known lower bound of O( k + 2 δ ) for 2 the sample complexity of Empirical Risk minimization on k−means applied to data in a unit ball of arbitrary dimension is tight. [sent-41, score-0.724]
33 Here is the desired bound on the error and δ is a bound on the 2 log 1 probability of failure. [sent-42, score-0.346]
34 The best currently known upper bound is O( k2 + 2 δ ) and is based on log4 k log 1 Rademacher complexities. [sent-43, score-0.223]
35 We improve this bound to O k min k, 2 + 2 δ , using an 2 argument that bounds the Fat-Shattering dimension of the appropriate function class using random projections and the Sauer-Shelah Lemma. [sent-44, score-0.459]
36 Generalizations of principal curves to parameterized principal manifolds in certain regularized settings have been studied in [18]. [sent-45, score-0.283]
37 There, the sample complexity was related to the decay of eigenvalues of a Mercer kernel associated with the regularizer. [sent-46, score-0.169]
38 When the manifold to be fit is a set of k points (k−means), we obtain a bound on the sample complexity s that is independent of m and depends at most linearly on k, which also leads to an approximation algorithm with additive error, based on sub-sampling. [sent-47, score-0.75]
39 If one allows a multiplicative error of 4 in addition to an additive error of , a statement of this nature has been proven by BenDavid (Theorem 7, [5]). [sent-48, score-0.088]
40 3 Upper bounds on the sample complexity of Empirical Risk Minimization In the remainder of the paper, C will always denote a universal constant which may differ across the paper. [sent-49, score-0.238]
41 For any submanifold M contained in, and probability distribution P supported on the unit ball B in Rm , let L(M, P) := d(M, x)2 dP(x). [sent-50, score-0.284]
42 , xs } from P, a tolerance and a class of manifolds F, Empirical Risk Minimization (ERM) outputs a s manifold in Merm (x) ∈ F such that i=1 d(xi , Merm )2 ≤ /2+inf N ∈F d(xi , N )2 . [sent-56, score-0.574]
43 Given error parameters , δ, and a rule A that outputs a manifold in F when provided with a set of samples, we define the sample complexity s = s( , δ, A) to be the least number such that for any probability distribution P in the unit ball, if the result of A applied to a set of at least s i. [sent-57, score-0.651]
44 d random samples from P is N , then P [L(N , P) < inf M∈F L(M, P) + ] > 1 − δ. [sent-59, score-0.139]
45 1 Bounded intrinsic curvature Let M be a Riemannian manifold and let p ∈ M. [sent-61, score-0.685]
46 The cut locus of p is the set of cut points of M. [sent-65, score-0.174]
47 The injectivity radius is the minimum taken over all points of the distance between the point and its cut locus. [sent-66, score-0.404]
48 Let Gi = Gi (d, V, λ, ι) be the family of all isometrically embedded, complete Riemannian submanifolds of B having dimension less or equal to d, induced d−dimensional volume less or equal to V , sectional curvature less or equal to λ and injectivity radius greater or equal to ι. [sent-68, score-0.907]
49 If s ≥ C min 1 2 log4 Uint , Uint Uint 2 + 1 2 log 1 δ , and x = {x1 , . [sent-72, score-0.112]
50 d points from P then, P L(Merm (x), P) − inf L(M, P) < M∈Gi > 1 − δ. [sent-77, score-0.209]
51 The proof of this theorem is deferred to Section 4. [sent-78, score-0.157]
52 2 Bounded extrinsic curvature We will consider submanifolds of B that have the property that around each of them, for any radius r < τ , the boundary of the set of all points within a distance r is smooth. [sent-80, score-0.612]
53 This class of submanifolds 3 has appeared in the context of manifold learning [16, 15]. [sent-81, score-0.479]
54 Let Ge = Ge (d, V, τ ) be the family of Riemannian submanifolds of B having dimension ≤ d, 1 volume ≤ V and condition number ≤ τ . [sent-86, score-0.339]
55 If s ≥ C min 1 2 log4 Uext Uext , Uext + 2 1 2 log 1 δ , and x = {x1 , . [sent-90, score-0.112]
56 d points from P then, P L(Merm (x), P) − inf L(M, P) < > 1 − δ. [sent-95, score-0.209]
57 M 4 (1) Relating bounded curvature to covering number In this subsection, we note that that bounds on the dimension, volume, sectional curvature and injectivity radius suffice to ensure that they can be covered by relatively few −balls. [sent-96, score-1.003]
58 Let VpM be the volume of a ball of radius M centered around a point p. [sent-97, score-0.393]
59 Let M be a complete Riemannian manifold and assume u that r is not greater than the injectivity radius ι. [sent-100, score-0.624]
60 Let K M denote the sectional curvature of M and let λ > 0 be a constant. [sent-101, score-0.333]
61 As a consequence of Theorem 3, we obtain an upper bound of V Cd on the number of disjoint sets of the form M ∩ B /32 (p) that can be packed in M. [sent-105, score-0.173]
62 5 Class of manifolds with a bounded covering number In this section, we show that uniform bounds relating the empirical squares loss and the expected squared loss can be obtained for a class of manifolds whose covering number at a different scale has a specified upper bound. [sent-113, score-0.649]
63 Let G be any family of subsets of B such that for all r > 0 every element M ∈ G can be covered using open Euclidean balls of radius r centered around U ( 1 ) points; let this set be ΛM (r). [sent-115, score-0.496]
64 A priori, it is unclear if sup M∈G s i=1 d(xi , M)2 − EP d(x, M)2 , s 4 (2) is a random variable, since the supremum of a set of random variables is not always a random variable (although if the set is countable this is true). [sent-117, score-0.131]
65 If U (16/ ) s≥C 2 1 min U (16/ ), U (16/ ) log4 2 + 1 2 log 1 δ , Then, s i=1 P sup M∈G d(xi , M)2 − EP d(x, M)2 < > 1 − δ. [sent-123, score-0.198]
66 , ck } be a set of k := U (16/ ) points in g ⊆ B, such that g is covered by the union of balls of radius /16 centered at these points. [sent-128, score-0.535]
67 Thus, for any point x ∈ B, d2 (x, g) ≤ 2 16 + d(x, c(g, )) 2 ≤ 256 + (5) mini x − ci + d(x, c(g, ))2 . [sent-129, score-0.12]
68 8 (6) Since mini x − ci is less or equal to 2, the last expression is less than 2 + d(x, c(g, ))2 . [sent-130, score-0.12]
69 The factor of 2−1/2 is necessitated by the fact that we ˜ wish the image of a point in the unit ball to also belong to the unit ball. [sent-146, score-0.243]
70 , ck } and a point x ∈ B, let fc (x) := d(x, c(g, ))2 . [sent-150, score-0.326]
71 , xs , sup fc ∈G s i=1 fc (xi ) − EP fc (x) s s i=1 ≤ + 2 xi − EP x s s i=1 4 sup 2 (7) min Φ(xi ) · ci ˜ i − EP min Φ(x) · ci . [sent-158, score-1.212]
72 (8) ˜ s fc ∈G i By Hoeffding’s inequality, P s i=1 xi s 2 2 − EP x > 2 < 2e−( 8 )s , 1 4 δ which is less than 2 . [sent-159, score-0.255]
73 By Theorem 5, P sup s i=1 min Φ(xi )·˜i c Therefore, P sup fc ∈G s i=1 i s fc ∈G fc (xi ) s − EP min Φ(x) · ci > ˜ − EP fc (x) ≤ i 2 5 ≥ 1 − δ. [sent-160, score-1.225]
74 6 Bounding the Fat-Shattering dimension using random projections The core of the uniform bounds in Theorems 1 and 2 is the following uniform bound on the minimum of k linear functions on a ball in Rm . [sent-163, score-0.622]
75 i Independent of m, if k s≥C 2 min 1 2 k log4 ,k + 1 2 log 1 δ , then s i=1 F (xi ) − EP F (x) < s P sup F ∈F > 1 − δ. [sent-169, score-0.198]
76 (10) It has been open since 1997 [3], whether the known lower bound of C k + 1 log 1 on the sample 2 2 δ complexity s is tight. [sent-170, score-0.395]
77 Theorem 5 in [14], uses Rademacher complexities to obtain an upper bound of 1 1 k2 + 2 log . [sent-171, score-0.223]
78 ) Theorem 5 improves this to C k 2 1 min 2 k log4 ,k + 1 2 log 1 δ (12) by putting together (11) with a bound of 1 (13) δ obtained using the Fat-Shattering dimension. [sent-173, score-0.238]
79 We would like to use VC theory to bound u, but doing so directly leads to a linear dependence on the ambient dimension m. [sent-179, score-0.446]
80 In order to circumvent this difficulty, for g := C log(u+k) , we consider a g−dimensional random 2 linear subspace and the image under an appropriately scaled orthogonal projection R of the points x1 , . [sent-180, score-0.07]
81 By a well-known theorem of [1], a bound of Ck log2 k 2 on fatF ( 24 ) implies the bound in (13) on the sample complexity, which implies Theorem 5. [sent-193, score-0.418]
82 6 7 Minimax lower bounds on the sample complexity Let K be a universal constant whose value will be fixed throughout this section. [sent-194, score-0.238]
83 In this section, we will state lower bounds on the number of samples needed for the minimax decision rule for learning from high dimensional data, with high probability, a manifold with a squared loss that is within of the optimal. [sent-195, score-0.676]
84 , eK 2d k } and S be the surface of the ball of radius 1 in Rm . [sent-200, score-0.288]
85 The following theorem shows that no algorithm can produce a nearly optimal manifold with high probability unless it uses a number of samples that depends linearly on volume, exponentially on intrinsic dimension and polynomially on the curvature. [sent-219, score-0.775]
86 Let A be an arbitrary algorithm that takes as input a set of data points x = {x1 , . [sent-223, score-0.07]
87 If + 2δ < 1 3 2 1 √ 2 2 −τ then, inf P L(MA (x), P) − inf L(M, P) < P M∈F < 1 − δ, where P ranges over all distributions supported on B and x1 , . [sent-227, score-0.278]
88 and Theorem 3 that F is a class of a manifolds such that 3d each manifold in F is contained in the union of K 2 k m−dimensional balls of radius τ , and 3d 5d {M1 , . [sent-236, score-0.733]
89 (The reason why we have K 2 rather than K 4 as in the statement of the theorem is that the parameters of Gi (d, V, τ ) are intrinsic, and to transfer to the extrinsic setting of the last sentence, one needs some leeway. [sent-240, score-0.153]
90 Then, L(MA (x), P) − inf L(M, P) < M∈F EPr Px L(MA (x), Pr ) − inf L(M, Pr ) < Ex PPr L(MA (x), Pr ) − inf L(M, Pr ) < = P ≤ = inf P Ex PPr L(MA (x), Pr ) < M∈F M∈F x . [sent-256, score-0.556]
91 We first prove a lower bound on inf x Er [L(MA (x), Pr )|x]. [sent-257, score-0.265]
92 , xk , the probability that xk+1 lies on a given sphere Sj is equal to 1 0 if one of x1 , . [sent-266, score-0.165]
93 , xk lies on Sj and K 2 k−k otherwise, where k ≤ k is the number of spheres in {Si } which contain at least one point among x1 , . [sent-269, score-0.22]
94 2 K k 7 3d 2 k balls of radius τ ; let their centers be x However, it is easy to check that for any dimension m, the cardinality of the set Sy of all Si that 1 have a nonempty intersection with the balls of radius 2√2 centered around y1 , . [sent-280, score-0.794]
95 Finally, we observe that it is not pos- sible for Ex PPr L(MA (x), Pr ) < x to be more than 1 − δ if inf x PPr L(MA (x), Pr ) x > + 2δ, because L(MA (x), Pr ) is bounded above by 2. [sent-289, score-0.191]
96 Supposing that a dot product between two vectors xi , xj can be computed using m operations, the total cost of sampling and ˜ then exhaustively solving k−means on the sample is O(msk s log n). [sent-292, score-0.215]
97 This curve, with probability 1 − δ, achieves a mean squared error of less than more than the optimum. [sent-297, score-0.113]
98 , zs ), output the curve obtained by joining zi to zi+1 for each i by a straight line segment. [sent-319, score-0.123]
99 On the sample complexity of learning smooth cuts on a manifold. [sent-374, score-0.169]
100 Finding the homology of submanifolds with high confidence from random samples. [sent-378, score-0.184]
wordName wordTfidf (topN-words)
[('manifold', 0.342), ('fc', 0.213), ('curvature', 0.193), ('ambient', 0.178), ('xk', 0.165), ('radius', 0.157), ('uext', 0.156), ('uint', 0.156), ('ep', 0.15), ('dimension', 0.142), ('inf', 0.139), ('submanifolds', 0.137), ('ball', 0.131), ('bound', 0.126), ('edif', 0.125), ('injectivity', 0.125), ('merm', 0.125), ('ppr', 0.125), ('balls', 0.123), ('ma', 0.118), ('pr', 0.115), ('dimensional', 0.112), ('manifolds', 0.111), ('intrinsic', 0.103), ('complexity', 0.101), ('theorem', 0.098), ('fatf', 0.093), ('sectional', 0.093), ('vpm', 0.093), ('riemannian', 0.093), ('sup', 0.086), ('eigenmaps', 0.085), ('minimax', 0.084), ('xs', 0.081), ('risk', 0.078), ('ci', 0.077), ('curve', 0.076), ('covered', 0.074), ('rm', 0.074), ('points', 0.07), ('gi', 0.07), ('bounds', 0.069), ('squared', 0.069), ('sample', 0.068), ('ge', 0.067), ('ck', 0.066), ('cim', 0.062), ('min', 0.062), ('principal', 0.06), ('projections', 0.06), ('volume', 0.06), ('deferred', 0.059), ('ek', 0.057), ('unit', 0.056), ('piecewise', 0.056), ('shatter', 0.055), ('exhaustively', 0.055), ('spheres', 0.055), ('mitter', 0.055), ('extrinsic', 0.055), ('means', 0.054), ('curves', 0.052), ('cut', 0.052), ('bounded', 0.052), ('focs', 0.051), ('gunnar', 0.05), ('methodologies', 0.05), ('submanifold', 0.05), ('narayanan', 0.05), ('log', 0.05), ('open', 0.05), ('cd', 0.05), ('relating', 0.049), ('hypothesis', 0.048), ('let', 0.047), ('uniform', 0.047), ('upper', 0.047), ('polynomially', 0.047), ('vicinity', 0.047), ('homology', 0.047), ('zs', 0.047), ('sy', 0.047), ('covering', 0.047), ('centered', 0.045), ('countable', 0.045), ('error', 0.044), ('linearly', 0.043), ('erm', 0.043), ('mini', 0.043), ('partha', 0.043), ('sanjoy', 0.043), ('xi', 0.042), ('ex', 0.041), ('algorithmic', 0.041), ('si', 0.04), ('outputs', 0.04), ('fitting', 0.039), ('ys', 0.039), ('tuples', 0.039), ('vc', 0.039)]
simIndex simValue paperId paperTitle
same-paper 1 1.0000005 232 nips-2010-Sample Complexity of Testing the Manifold Hypothesis
Author: Hariharan Narayanan, Sanjoy Mitter
Abstract: The hypothesis that high dimensional data tends to lie in the vicinity of a low dimensional manifold is the basis of a collection of methodologies termed Manifold Learning. In this paper, we study statistical aspects of the question of fitting a manifold with a nearly optimal least squared error. Given upper bounds on the dimension, volume, and curvature, we show that Empirical Risk Minimization can produce a nearly optimal manifold using a number of random samples that is independent of the ambient dimension of the space in which data lie. We obtain an upper bound on the required number of samples that depends polynomially on the curvature, exponentially on the intrinsic dimension, and linearly on the intrinsic volume. For constant error, we prove a matching minimax lower bound on the sample complexity that shows that this dependence on intrinsic dimension, volume log 1 and curvature is unavoidable. Whether the known lower bound of O( k + 2 δ ) 2 for the sample complexity of Empirical Risk minimization on k−means applied to data in a unit ball of arbitrary dimension is tight, has been an open question since 1997 [3]. Here is the desired bound on the error and δ is a bound on the probability of failure. We improve the best currently known upper bound [14] of 2 log 1 log4 k log 1 O( k2 + 2 δ ) to O k min k, 2 + 2 δ . Based on these results, we 2 devise a simple algorithm for k−means and another that uses a family of convex programs to fit a piecewise linear curve of a specified length to high dimensional data, where the sample complexity is independent of the ambient dimension. 1
2 0.20131718 146 nips-2010-Learning Multiple Tasks using Manifold Regularization
Author: Arvind Agarwal, Samuel Gerber, Hal Daume
Abstract: We present a novel method for multitask learning (MTL) based on manifold regularization: assume that all task parameters lie on a manifold. This is the generalization of a common assumption made in the existing literature: task parameters share a common linear subspace. One proposed method uses the projection distance from the manifold to regularize the task parameters. The manifold structure and the task parameters are learned using an alternating optimization framework. When the manifold structure is fixed, our method decomposes across tasks which can be learnt independently. An approximation of the manifold regularization scheme is presented that preserves the convexity of the single task learning problem, and makes the proposed MTL framework efficient and easy to implement. We show the efficacy of our method on several datasets. 1
3 0.18856619 270 nips-2010-Tight Sample Complexity of Large-Margin Learning
Author: Sivan Sabato, Nathan Srebro, Naftali Tishby
Abstract: We obtain a tight distribution-specific characterization of the sample complexity of large-margin classification with L2 regularization: We introduce the γ-adapted-dimension, which is a simple function of the spectrum of a distribution’s covariance matrix, and show distribution-specific upper and lower bounds on the sample complexity, both governed by the γ-adapted-dimension of the source distribution. We conclude that this new quantity tightly characterizes the true sample complexity of large-margin classification. The bounds hold for a rich family of sub-Gaussian distributions. 1
4 0.17667313 220 nips-2010-Random Projection Trees Revisited
Author: Aman Dhesi, Purushottam Kar
Abstract: The Random Projection Tree (RPT REE) structures proposed in [1] are space partitioning data structures that automatically adapt to various notions of intrinsic dimensionality of data. We prove new results for both the RPT REE -M AX and the RPT REE -M EAN data structures. Our result for RPT REE -M AX gives a nearoptimal bound on the number of levels required by this data structure to reduce the size of its cells by a factor s ≥ 2. We also prove a packing lemma for this data structure. Our final result shows that low-dimensional manifolds have bounded Local Covariance Dimension. As a consequence we show that RPT REE -M EAN adapts to manifold dimension as well.
5 0.17269517 195 nips-2010-Online Learning in The Manifold of Low-Rank Matrices
Author: Uri Shalit, Daphna Weinshall, Gal Chechik
Abstract: When learning models that are represented in matrix forms, enforcing a low-rank constraint can dramatically improve the memory and run time complexity, while providing a natural regularization of the model. However, naive approaches for minimizing functions over the set of low-rank matrices are either prohibitively time consuming (repeated singular value decomposition of the matrix) or numerically unstable (optimizing a factored representation of the low rank matrix). We build on recent advances in optimization over manifolds, and describe an iterative online learning procedure, consisting of a gradient step, followed by a second-order retraction back to the manifold. While the ideal retraction is hard to compute, and so is the projection operator that approximates it, we describe another second-order retraction that can be computed efficiently, with run time and memory complexity of O ((n + m)k) for a rank-k matrix of dimension m × n, given rank-one gradients. We use this algorithm, LORETA, to learn a matrixform similarity measure over pairs of documents represented as high dimensional vectors. LORETA improves the mean average precision over a passive- aggressive approach in a factorized model, and also improves over a full model trained over pre-selected features using the same memory requirements. LORETA also showed consistent improvement over standard methods in a large (1600 classes) multi-label image classification task. 1
6 0.15972881 243 nips-2010-Smoothness, Low Noise and Fast Rates
7 0.11920076 193 nips-2010-Online Learning: Random Averages, Combinatorial Parameters, and Learnability
8 0.10900967 222 nips-2010-Random Walk Approach to Regret Minimization
9 0.10487509 114 nips-2010-Humans Learn Using Manifolds, Reluctantly
10 0.10469947 223 nips-2010-Rates of convergence for the cluster tree
11 0.098670855 29 nips-2010-An Approximate Inference Approach to Temporal Optimization in Optimal Control
12 0.097016767 205 nips-2010-Permutation Complexity Bound on Out-Sample Error
13 0.09390381 27 nips-2010-Agnostic Active Learning Without Constraints
14 0.087272897 31 nips-2010-An analysis on negative curvature induced by singularity in multi-layer neural-network learning
15 0.083268695 63 nips-2010-Distributed Dual Averaging In Networks
16 0.08312013 250 nips-2010-Spectral Regularization for Support Estimation
17 0.083023295 134 nips-2010-LSTD with Random Projections
18 0.081245176 178 nips-2010-Multivariate Dyadic Regression Trees for Sparse Learning Problems
19 0.079434745 202 nips-2010-Parallelized Stochastic Gradient Descent
20 0.078358233 233 nips-2010-Scrambled Objects for Least-Squares Regression
topicId topicWeight
[(0, 0.239), (1, 0.031), (2, 0.181), (3, 0.055), (4, 0.108), (5, 0.065), (6, -0.007), (7, -0.053), (8, -0.054), (9, 0.024), (10, 0.087), (11, -0.138), (12, -0.061), (13, -0.086), (14, -0.008), (15, 0.012), (16, 0.15), (17, -0.128), (18, -0.002), (19, 0.0), (20, 0.003), (21, -0.093), (22, 0.008), (23, -0.064), (24, 0.02), (25, -0.01), (26, -0.27), (27, -0.089), (28, -0.019), (29, 0.0), (30, 0.025), (31, 0.057), (32, 0.105), (33, -0.04), (34, -0.068), (35, -0.129), (36, 0.092), (37, -0.02), (38, -0.06), (39, -0.051), (40, 0.152), (41, 0.015), (42, 0.005), (43, -0.022), (44, -0.09), (45, -0.118), (46, 0.093), (47, 0.047), (48, -0.032), (49, 0.012)]
simIndex simValue paperId paperTitle
same-paper 1 0.9398306 232 nips-2010-Sample Complexity of Testing the Manifold Hypothesis
Author: Hariharan Narayanan, Sanjoy Mitter
Abstract: The hypothesis that high dimensional data tends to lie in the vicinity of a low dimensional manifold is the basis of a collection of methodologies termed Manifold Learning. In this paper, we study statistical aspects of the question of fitting a manifold with a nearly optimal least squared error. Given upper bounds on the dimension, volume, and curvature, we show that Empirical Risk Minimization can produce a nearly optimal manifold using a number of random samples that is independent of the ambient dimension of the space in which data lie. We obtain an upper bound on the required number of samples that depends polynomially on the curvature, exponentially on the intrinsic dimension, and linearly on the intrinsic volume. For constant error, we prove a matching minimax lower bound on the sample complexity that shows that this dependence on intrinsic dimension, volume log 1 and curvature is unavoidable. Whether the known lower bound of O( k + 2 δ ) 2 for the sample complexity of Empirical Risk minimization on k−means applied to data in a unit ball of arbitrary dimension is tight, has been an open question since 1997 [3]. Here is the desired bound on the error and δ is a bound on the probability of failure. We improve the best currently known upper bound [14] of 2 log 1 log4 k log 1 O( k2 + 2 δ ) to O k min k, 2 + 2 δ . Based on these results, we 2 devise a simple algorithm for k−means and another that uses a family of convex programs to fit a piecewise linear curve of a specified length to high dimensional data, where the sample complexity is independent of the ambient dimension. 1
2 0.87575072 220 nips-2010-Random Projection Trees Revisited
Author: Aman Dhesi, Purushottam Kar
Abstract: The Random Projection Tree (RPT REE) structures proposed in [1] are space partitioning data structures that automatically adapt to various notions of intrinsic dimensionality of data. We prove new results for both the RPT REE -M AX and the RPT REE -M EAN data structures. Our result for RPT REE -M AX gives a nearoptimal bound on the number of levels required by this data structure to reduce the size of its cells by a factor s ≥ 2. We also prove a packing lemma for this data structure. Our final result shows that low-dimensional manifolds have bounded Local Covariance Dimension. As a consequence we show that RPT REE -M EAN adapts to manifold dimension as well.
3 0.67051655 270 nips-2010-Tight Sample Complexity of Large-Margin Learning
Author: Sivan Sabato, Nathan Srebro, Naftali Tishby
Abstract: We obtain a tight distribution-specific characterization of the sample complexity of large-margin classification with L2 regularization: We introduce the γ-adapted-dimension, which is a simple function of the spectrum of a distribution’s covariance matrix, and show distribution-specific upper and lower bounds on the sample complexity, both governed by the γ-adapted-dimension of the source distribution. We conclude that this new quantity tightly characterizes the true sample complexity of large-margin classification. The bounds hold for a rich family of sub-Gaussian distributions. 1
4 0.57861924 243 nips-2010-Smoothness, Low Noise and Fast Rates
Author: Nathan Srebro, Karthik Sridharan, Ambuj Tewari
Abstract: √ ˜ We establish an excess risk bound of O HR2 + HL∗ Rn for ERM with an H-smooth loss n function and a hypothesis class with Rademacher complexity Rn , where L∗ is the best risk achievable by the hypothesis class. For typical hypothesis classes where Rn = R/n, this translates to ˜ ˜ a learning rate of O (RH/n) in the separable (L∗ = 0) case and O RH/n + L∗ RH/n more generally. We also provide similar guarantees for online and stochastic convex optimization of a smooth non-negative objective. 1
5 0.57689297 114 nips-2010-Humans Learn Using Manifolds, Reluctantly
Author: Tim Rogers, Chuck Kalish, Joseph Harrison, Xiaojin Zhu, Bryan R. Gibson
Abstract: When the distribution of unlabeled data in feature space lies along a manifold, the information it provides may be used by a learner to assist classification in a semi-supervised setting. While manifold learning is well-known in machine learning, the use of manifolds in human learning is largely unstudied. We perform a set of experiments which test a human’s ability to use a manifold in a semisupervised learning task, under varying conditions. We show that humans may be encouraged into using the manifold, overcoming the strong preference for a simple, axis-parallel linear boundary. 1
6 0.56606132 205 nips-2010-Permutation Complexity Bound on Out-Sample Error
7 0.54243928 193 nips-2010-Online Learning: Random Averages, Combinatorial Parameters, and Learnability
8 0.54025459 191 nips-2010-On the Theory of Learnining with Privileged Information
9 0.53970569 146 nips-2010-Learning Multiple Tasks using Manifold Regularization
10 0.53078723 195 nips-2010-Online Learning in The Manifold of Low-Rank Matrices
11 0.52440888 233 nips-2010-Scrambled Objects for Least-Squares Regression
12 0.48405221 163 nips-2010-Lower Bounds on Rate of Convergence of Cutting Plane Methods
13 0.47568029 222 nips-2010-Random Walk Approach to Regret Minimization
14 0.47551647 134 nips-2010-LSTD with Random Projections
15 0.4632057 75 nips-2010-Empirical Risk Minimization with Approximations of Probabilistic Grammars
16 0.45702255 178 nips-2010-Multivariate Dyadic Regression Trees for Sparse Learning Problems
17 0.43928745 202 nips-2010-Parallelized Stochastic Gradient Descent
18 0.43635914 248 nips-2010-Sparse Inverse Covariance Selection via Alternating Linearization Methods
19 0.42095834 31 nips-2010-An analysis on negative curvature induced by singularity in multi-layer neural-network learning
20 0.41162509 226 nips-2010-Repeated Games against Budgeted Adversaries
topicId topicWeight
[(13, 0.029), (27, 0.024), (30, 0.611), (35, 0.012), (45, 0.113), (50, 0.027), (52, 0.022), (60, 0.028), (77, 0.027), (90, 0.04)]
simIndex simValue paperId paperTitle
same-paper 1 0.94689173 232 nips-2010-Sample Complexity of Testing the Manifold Hypothesis
Author: Hariharan Narayanan, Sanjoy Mitter
Abstract: The hypothesis that high dimensional data tends to lie in the vicinity of a low dimensional manifold is the basis of a collection of methodologies termed Manifold Learning. In this paper, we study statistical aspects of the question of fitting a manifold with a nearly optimal least squared error. Given upper bounds on the dimension, volume, and curvature, we show that Empirical Risk Minimization can produce a nearly optimal manifold using a number of random samples that is independent of the ambient dimension of the space in which data lie. We obtain an upper bound on the required number of samples that depends polynomially on the curvature, exponentially on the intrinsic dimension, and linearly on the intrinsic volume. For constant error, we prove a matching minimax lower bound on the sample complexity that shows that this dependence on intrinsic dimension, volume log 1 and curvature is unavoidable. Whether the known lower bound of O( k + 2 δ ) 2 for the sample complexity of Empirical Risk minimization on k−means applied to data in a unit ball of arbitrary dimension is tight, has been an open question since 1997 [3]. Here is the desired bound on the error and δ is a bound on the probability of failure. We improve the best currently known upper bound [14] of 2 log 1 log4 k log 1 O( k2 + 2 δ ) to O k min k, 2 + 2 δ . Based on these results, we 2 devise a simple algorithm for k−means and another that uses a family of convex programs to fit a piecewise linear curve of a specified length to high dimensional data, where the sample complexity is independent of the ambient dimension. 1
2 0.91859174 264 nips-2010-Synergies in learning words and their referents
Author: Mark Johnson, Katherine Demuth, Bevan Jones, Michael J. Black
Abstract: This paper presents Bayesian non-parametric models that simultaneously learn to segment words from phoneme strings and learn the referents of some of those words, and shows that there is a synergistic interaction in the acquisition of these two kinds of linguistic information. The models themselves are novel kinds of Adaptor Grammars that are an extension of an embedding of topic models into PCFGs. These models simultaneously segment phoneme sequences into words and learn the relationship between non-linguistic objects to the words that refer to them. We show (i) that modelling inter-word dependencies not only improves the accuracy of the word segmentation but also of word-object relationships, and (ii) that a model that simultaneously learns word-object relationships and word segmentation segments more accurately than one that just learns word segmentation on its own. We argue that these results support an interactive view of language acquisition that can take advantage of synergies such as these. 1
3 0.86973292 283 nips-2010-Variational Inference over Combinatorial Spaces
Author: Alexandre Bouchard-côté, Michael I. Jordan
Abstract: Since the discovery of sophisticated fully polynomial randomized algorithms for a range of #P problems [1, 2, 3], theoretical work on approximate inference in combinatorial spaces has focused on Markov chain Monte Carlo methods. Despite their strong theoretical guarantees, the slow running time of many of these randomized algorithms and the restrictive assumptions on the potentials have hindered the applicability of these algorithms to machine learning. Because of this, in applications to combinatorial spaces simple exact models are often preferred to more complex models that require approximate inference [4]. Variational inference would appear to provide an appealing alternative, given the success of variational methods for graphical models [5]; unfortunately, however, it is not obvious how to develop variational approximations for combinatorial objects such as matchings, partial orders, plane partitions and sequence alignments. We propose a new framework that extends variational inference to a wide range of combinatorial spaces. Our method is based on a simple assumption: the existence of a tractable measure factorization, which we show holds in many examples. Simulations on a range of matching models show that the algorithm is more general and empirically faster than a popular fully polynomial randomized algorithm. We also apply the framework to the problem of multiple alignment of protein sequences, obtaining state-of-the-art results on the BAliBASE dataset [6]. 1
4 0.76518196 58 nips-2010-Decomposing Isotonic Regression for Efficiently Solving Large Problems
Author: Ronny Luss, Saharon Rosset, Moni Shahar
Abstract: A new algorithm for isotonic regression is presented based on recursively partitioning the solution space. We develop efficient methods for each partitioning subproblem through an equivalent representation as a network flow problem, and prove that this sequence of partitions converges to the global solution. These network flow problems can further be decomposed in order to solve very large problems. Success of isotonic regression in prediction and our algorithm’s favorable computational properties are demonstrated through simulated examples as large as 2 × 105 variables and 107 constraints.
5 0.73043716 40 nips-2010-Beyond Actions: Discriminative Models for Contextual Group Activities
Author: Tian Lan, Yang Wang, Weilong Yang, Greg Mori
Abstract: We propose a discriminative model for recognizing group activities. Our model jointly captures the group activity, the individual person actions, and the interactions among them. Two new types of contextual information, group-person interaction and person-person interaction, are explored in a latent variable framework. Different from most of the previous latent structured models which assume a predefined structure for the hidden layer, e.g. a tree structure, we treat the structure of the hidden layer as a latent variable and implicitly infer it during learning and inference. Our experimental results demonstrate that by inferring this contextual information together with adaptive structures, the proposed model can significantly improve activity recognition performance. 1
6 0.69918168 270 nips-2010-Tight Sample Complexity of Large-Margin Learning
7 0.65434718 220 nips-2010-Random Projection Trees Revisited
8 0.57367504 222 nips-2010-Random Walk Approach to Regret Minimization
9 0.56261134 75 nips-2010-Empirical Risk Minimization with Approximations of Probabilistic Grammars
10 0.54227597 193 nips-2010-Online Learning: Random Averages, Combinatorial Parameters, and Learnability
11 0.53142953 88 nips-2010-Extensions of Generalized Binary Search to Group Identification and Exponential Costs
12 0.52812493 260 nips-2010-Sufficient Conditions for Generating Group Level Sparsity in a Robust Minimax Framework
13 0.52461582 285 nips-2010-Why are some word orders more common than others? A uniform information density account
14 0.51781952 221 nips-2010-Random Projections for $k$-means Clustering
15 0.49901918 274 nips-2010-Trading off Mistakes and Don't-Know Predictions
16 0.4973616 155 nips-2010-Learning the context of a category
17 0.49499908 173 nips-2010-Multi-View Active Learning in the Non-Realizable Case
18 0.49309847 163 nips-2010-Lower Bounds on Rate of Convergence of Cutting Plane Methods
19 0.49087891 233 nips-2010-Scrambled Objects for Least-Squares Regression
20 0.48580933 288 nips-2010-Worst-case bounds on the quality of max-product fixed-points