nips nips2010 nips2010-220 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Aman Dhesi, Purushottam Kar
Abstract: The Random Projection Tree (RPT REE) structures proposed in [1] are space partitioning data structures that automatically adapt to various notions of intrinsic dimensionality of data. We prove new results for both the RPT REE -M AX and the RPT REE -M EAN data structures. Our result for RPT REE -M AX gives a nearoptimal bound on the number of levels required by this data structure to reduce the size of its cells by a factor s ≥ 2. We also prove a packing lemma for this data structure. Our final result shows that low-dimensional manifolds have bounded Local Covariance Dimension. As a consequence we show that RPT REE -M EAN adapts to manifold dimension as well.
Reference: text
sentIndex sentText sentNum sentScore
1 in Abstract The Random Projection Tree (RPT REE) structures proposed in [1] are space partitioning data structures that automatically adapt to various notions of intrinsic dimensionality of data. [sent-7, score-0.171]
2 Our result for RPT REE -M AX gives a nearoptimal bound on the number of levels required by this data structure to reduce the size of its cells by a factor s ≥ 2. [sent-9, score-0.214]
3 We also prove a packing lemma for this data structure. [sent-10, score-0.198]
4 As a consequence we show that RPT REE -M EAN adapts to manifold dimension as well. [sent-12, score-0.226]
5 For example, [4] and [5] require knowledge about the intrinsic dimensionality of the manifold whereas [3] requires a sampling of points that is “sufficiently” dense with respect to some manifold parameters. [sent-23, score-0.256]
6 Their data structures are akin to the k-d tree structure and offer guaranteed reduction in the size of the cells after a bounded number of levels. [sent-25, score-0.184]
7 1 Contributions The RPT REE structures are new entrants in a large family of space partitioning data structures such as k-d trees [11], BBD trees [12], BAR trees [13] and several others (see [14] for an overview). [sent-30, score-0.226]
8 Space Partitioning Guarantee : There exists a bound L(s), s ≥ 2 on the number of levels one has to go down before all descendants of a node of size ∆ are of size ∆/s or less. [sent-32, score-0.269]
9 The size of a cell is variously defined as the length of the longest side of the cell (for box-shaped cells), radius of the cell, etc. [sent-33, score-0.691]
10 Packing Guarantee : Given a fixed ball B of radius R and a size parameter r, there exists a bound on the number of disjoint cells of the tree that are of size greater than r and intersect B. [sent-37, score-0.58]
11 Such bounds are usually arrived at by first proving a bound on the aspect ratio for cells of the tree. [sent-38, score-0.187]
12 We first present a bound on the number of levels required for size reduction by any given factor in an RPT REE -M AX. [sent-41, score-0.172]
13 Instead we prove a weaker result that can nevertheless be exploited to give a packing lemma of the kind mentioned above. [sent-45, score-0.198]
14 More specifically, given a ball B, we prove an aspect ratio bound for the smallest cell in the RPT REE -M AX that completely contains B. [sent-46, score-0.484]
15 By showing that low-dimensional manifolds have bounded local covariance dimension, we show its adaptability to the manifold dimension as well. [sent-49, score-0.292]
16 Our result demonstrates the robustness of the notion of manifold dimension - a notion that is able to connect to a geometric notion of dimensionality such as the doubling dimension (proved in [1]) as well as a statistical notion such as Local Covariance Dimension (this paper). [sent-50, score-0.433]
17 We will denote by B(x, r), a closed ball of radius r centered at x. [sent-53, score-0.379]
18 2 The RPT REE -M AX structure The RPT REE -M AX structure adapts to the doubling dimension of data (see definition below). [sent-55, score-0.222]
19 Since low-dimensional manifolds have low doubling dimension (see [1] Theorem 22) hence the structure adapts to manifold dimension as well. [sent-56, score-0.433]
20 The doubling dimension of a set S ⊂ RD is the smallest integer d such that for any ball B(x, r) ⊂ RD , the set B(x, r) ∩ S can be covered by 2d balls of radius r/2. [sent-58, score-0.764]
21 The RPT REE -M AX algorithm is presented data imbedded in RD having doubling dimension d. [sent-59, score-0.187]
22 Since it is difficult to get the exact value of the radius of a data set, 2 the algorithm settles for a constant factor approximation to the value by choosing an arbitrary data ˜ point x ∈ C and using the estimate ∆ = max({ x − y : y ∈ C}). [sent-61, score-0.304]
23 Pick any cell C in the RPT REE -M AX; suppose that S ∩ C has doubling dimension ≤ d. [sent-65, score-0.358]
24 Then with probability at least 1/2 (over the randomization in constructing the subtree rooted at C), every descendant C ′ more than c1 d log d levels below C has radius(C ′ ) ≤ radius(C)/2. [sent-66, score-0.22]
25 Let us consider extensions of this result to bound the number of levels it takes for the size of all descendants to go down by a factor s > 2. [sent-68, score-0.292]
26 Starting off in a cell C of radius ∆, we are assured of a reduction in size by a factor of 2 after c1 d log d levels. [sent-70, score-0.579]
27 Hence all 2c1 d log d nodes at this level have radius ∆/2 or less. [sent-71, score-0.314]
28 For any δ > 0, with probability at least 1 − δ, every descendant C ′ which is more than c1 d log d + log(1/δ) levels below C has radius(C ′ ) ≤ radius(C)/2. [sent-77, score-0.189]
29 This gives us a way to boost the confidence and do the following : go down L = c1 d log d + 2 levels from C to get the the radius of all the 2c1 d log d+2 descendants down to ∆/2 with confidence 1 − 1/4. [sent-78, score-0.577]
30 Afterward, go an additional L′ = c1 d log d + L + 2 levels from each of these descendants so that for any cell at level L, the probability of it having a descendant of radius > ∆/4 after L′ levels is 1 1 1 less than 4·2L . [sent-79, score-0.895]
31 Hence conclude with confidence at least 1 − 1 − 4·2L · 2L ≥ 2 that all descendants 4 of C after 2L + c1 d log d + 2 have radius ≤ ∆/4. [sent-80, score-0.41]
32 For any s ≥ 2, with probability at least 1−1/4, every descendant C ′ which is more than c2 ·s·d log d levels below C has radius(C ′ ) ≤ radius(C)/s. [sent-83, score-0.189]
33 The first result we prove in the next section is a bound on the number of levels that is poly-logarithmic in the size reduction factor s. [sent-87, score-0.207]
34 Pick any cell C in the RPT REE -M AX; suppose that S ∩ C has doubling dimension ≤ d. [sent-91, score-0.358]
35 Then for any s ≥ 2, with probability at least 1 − 1/4 (over the randomization in constructing the subtree rooted at C), for every descendant C ′ which is more than c3 · log s · d log sd levels below C, we have radius(C ′ ) ≤ radius(C)/s. [sent-92, score-0.323]
36 Compared to this, data structures such as [12] give deterministic guarantees for such a reduction in D log s levels which can be shown to be optimal (see [1] for an example). [sent-93, score-0.195]
37 Moving on with the proof, let us consider a cell C of radius ∆ in the RPT REE -M AX that contains a dataset S having doubling dimension ≤ d. [sent-95, score-0.639]
38 Then for any ǫ > 0, a repeated application of Definition 1 shows that the S can be covered using at most 2d log(1/ǫ) balls ∆ of radius ǫ∆. [sent-96, score-0.482]
39 We will cover S ∩ C using balls of radius 960s√d so that O (sd)d balls would suffice. [sent-97, score-0.683]
40 960s d neutral split good split ∆ B1 B2 bad split √ √ Figure 1: Balls B1 and B2 are of radius ∆/s d and their centers are ∆/s − ∆/s d apart. [sent-99, score-0.649]
41 If random splits separate data from all such pairs of balls i. [sent-100, score-0.241]
42 for no pair does any cell contain data from both balls of the pair, then each resulting cell would only contain data from pairs whose centers ∆ are closer than ∆ − 960s√d . [sent-102, score-0.652]
43 Thus the radius of each such cell would be at most ∆/s. [sent-103, score-0.476]
44 s We fix such a pair of balls calling them B1 and B2 . [sent-104, score-0.222]
45 A split in the RPT REE -M AX is said to be good with respect to this pair if it sends points inside B1 to one child of the cell in the RPT REE -M AX and points inside B2 to the other, bad if it sends points from both balls to both children and neutral otherwise (See Figure 1). [sent-105, score-0.682]
46 Let B = B(x, δ) be a ball contained inside an RPT REE -M AX cell of radius ∆ that contains a dataset S of doubling dimension d. [sent-107, score-0.786]
47 Lets us say that a random split splits this ball if the split separates the data set S into two parts. [sent-108, score-0.316]
48 Then a random split of the cell splits B with probability √ atmost 3δ∆ d . [sent-109, score-0.34]
49 Let B1 and B2 be a pair of balls as described above contained in the cell C that contains data of doubling dimension d. [sent-112, score-0.605]
50 Then a random split of the cell is a good split with respect to this pair 1 with probability at least 56s . [sent-113, score-0.41]
51 Let B1 and B2 be a pair of balls as described above contained in the cell C that contains data of doubling dimension d. [sent-118, score-0.605]
52 Then a random split of the cell is a bad split with respect to this pair 1 with probability at most 320s . [sent-119, score-0.455]
53 We instead make a simple observation that the probability of a bad split is upper bounded by the probability that one of the balls is split since for any two events A and B, P [A ∩ B] ≤ min{P [A] , P [B]}. [sent-123, score-0.479]
54 What we will prove is that starting with a pair of balls in a cell C, the probability that some cell k levels below has data from both the balls is exponentially small in k. [sent-126, score-0.945]
55 Thus, after going enough number of levels we can take a union bound over all pairs of balls whose centers are well separated (which are O (sd)2d in number) and conclude the proof. [sent-127, score-0.361]
56 (of Theorem 5) Consider a cell C of radius ∆ in the RPT REE -M AX and fix a pair √ balls of √ contained inside C with radii ∆/960s d and centers separated by at least ∆/s − ∆/960s d. [sent-129, score-0.804]
57 Let 4 pi denote the probability that a cell i levels below C has a descendant j levels below itself that j contains data points from both the balls. [sent-130, score-0.432]
58 However using this result would require us k 1 to go down k = Ω(sd log(sd)) levels before p0 = Ω((sd)2d ) which results in a bound that is worse k (by a factor logarithmic in s) than the one given by Theorem 4. [sent-137, score-0.216]
59 This can be attributed to the small probability of a good split for a tiny pair of balls in large cells. [sent-138, score-0.327]
60 However, here we are completely neglecting the fact that as we go down the levels, the radii of cells go down as well and good splits become more frequent. [sent-139, score-0.253]
61 Indeed setting s = 2 in Theorems 7 and 8 tells us that if the pair of balls were to be contained in a ∆ 1 1 cell of radius s/2 then the good and bad split probabilities are 112 and 640 respectively. [sent-140, score-0.857]
62 This paves way for an inductive argument : assume that with probability > 1 − 1/4, in L(s) levels, the size of all descendants go down by a factor s. [sent-141, score-0.188]
63 Denote by pl the probability of a good split in a cell at depth g l and by pl the corresponding probability of a bad split. [sent-142, score-0.493]
64 Set l∗ = L(s/2) and let E be the event that b ∆ the radius of every cell at level l∗ is less than s/2 . [sent-143, score-0.476]
65 Then, 1 1 · 1− 112 4 1 150 ∗ ≥ P [good split in C ′ |E] · P [E] ≥ ∗ = P [bad split in C ′ |E] · P [E] + P [bad split in C ′ |¬E] · P [¬E] 1 1 1 1 ·1+ · ≤ ≤ 640 640 4 512 pl g pl b 1 m . [sent-145, score-0.359]
66 4 A packing lemma for RPT REE -M AX In this section we prove a probabilistic packing lemma for RPT REE -M AX. [sent-150, score-0.361]
67 Given any fixed ball B(x, R) ⊂ RD , with probability greater than 1/2 (where the randomization is over the construction of the RPT REE -M AX), the number of disjoint RPT REE O(d log d log(dR/r)) M AX cells of radius greater than r that intersect B is at most R . [sent-152, score-0.589]
68 We will prove the r result in two steps : first of all we will show that with high probability, the ball B will be completely √ inscribed in an RPT REE -M AX cell C of radius no more than O Rd d log d . [sent-155, score-0.71]
69 Thus the number of disjoint cells of radius at least r that intersect this ball is bounded by the number of descendants of C with this radius. [sent-156, score-0.628]
70 1 An effective aspect ratio bound for RPT REE -M AX cells In this section we prove an upper bound on the radius of the smallest RPT REE -M AX cell that completely contains a given ball B of radius R. [sent-159, score-1.156]
71 We proceed with the proof by first 5 useful split C Bi useless split B ∆ 2 √ Figure 2: Balls Bi are of radius ∆/512 d and their centers are ∆/2 far from the center of B. [sent-162, score-0.524]
72 showing that the probability that B will be split before it lands up in a cell of radius ∆/2 is at most a quantity inversely proportional to ∆. [sent-163, score-0.605]
73 We consider balls of radius √ ∆/512 d surrounding B at a distance of ∆/2 (see Figure 2). [sent-166, score-0.482]
74 These balls are made to cover the √ annulus centered at B of mean radius ∆/2 and thickness ∆/512 d – clearly dO(d) balls suffice. [sent-167, score-0.683]
75 Without loss of generality assume that the centers of all these balls lie in C. [sent-168, score-0.26]
76 Notice that if B gets separated from all these balls without getting split in the process then it will lie in a cell of radius < ∆/2. [sent-169, score-0.785]
77 Using a proof technique similar to that used in 1 Lemma 7 we can show that the probability of a useful split is at least 192 whereas Lemma 6 tells us that the probability of a useless split is at most √ 3R d ∆ . [sent-171, score-0.235]
78 There exists a constant c5 such that the probability of a ball of radius R in a cell of √ radius ∆ getting split before it lands up in a cell of radius ∆/2 is at most c5 Rd ∆d log d . [sent-173, score-1.493]
79 There exists a constant c6 such that with probability > 1 − 1/4, a given (fixed) ball B of radius R will be completely inscribed in an RPT REE -M AX cell C of radius no more than √ c6 · Rd d log d. [sent-177, score-0.972]
80 (of Theorem 10) Given a ball B of radius R, Theorem 12 shows that with probability at √ least 3/4, B will lie in a cell C of radius at most R′ = O Rd d log d . [sent-180, score-0.923]
81 Hence all cells of radius atleast r that intersect this ball must be either descendants or ancestors of C. [sent-181, score-0.585]
82 Since we want an upper bound on the largest number of such disjoint cells, it suffices to count the number of descendants of C of radius no less than r. [sent-182, score-0.436]
83 We know from Theorem 5 that with probability at least 3/4 in log(R′ /r)d log(dR′ /r) levels the radius of all cells must go below r. [sent-183, score-0.502]
84 5 Local covariance dimension of a smooth manifold The second variant of RPT REE, namely RPT REE -M EAN, adapts to the local covariance dimension (see definition below) of data. [sent-187, score-0.389]
85 Informally, the guarantee is of the following kind : given data that has small local covariance dimension, on expectation, a data point in a cell of radius r in the RPT REE -M EAN will be contained in a cell of radius c7 · r in the next level for some constant c7 < 1. [sent-189, score-1.054]
86 We will prove that a d-dimensional Riemannian submanifold M of RD has bounded local covariance dimension thus proving that RPT REE -M EAN adapts to manifold dimension as well. [sent-192, score-0.412]
87 A set S ⊂ RD has local covariance dimension (d, ǫ, r) if there exists an isometry M of RD under which the set S when restricted to any ball of radius r has a covariance matrix for which some d diagonal elements contribute a (1 − ǫ) fraction of its trace. [sent-194, score-0.561]
88 Given a data set S ⊂ M where M is a d-dimensional Riemannian manifold √ ǫτ with condition number τ , then for any ǫ ≤ 1 , S has local covariance dimension d, ǫ, 3 . [sent-202, score-0.225]
89 Informally, if we restrict ourselves to regions of the manifold of radius τ or less, then we get the requisite flatness properties. [sent-206, score-0.378]
90 For the RPT REE M AX data structure, we provided an improved bound (Theorem 5) on the number of levels required to decrease the size of the tree cells by any factor s ≥ 2. [sent-223, score-0.246]
91 It would be nice if this can be brought down to logarithmic since it would directly improve the packing lemma (Theorem 10) as well. [sent-225, score-0.183]
92 We have shown that the smallest cell in the RPT REE -M AX that completely contains a fixed ball B of √ radius R has an aspect ratio no more than O d d log d since it has a ball of radius R inscribed in √ it and can be circumscribed by a ball of radius no more than O Rd d log d . [sent-228, score-1.564]
93 Any improvement in the aspect ratio of the smallest cell that contains a given ball will also directly improve the packing lemma. [sent-229, score-0.509]
94 Moving on to our results for the RPT REE -M EAN, we demonstrated that it adapts to manifold dimension as well. [sent-230, score-0.226]
95 For instance, are ǫτ the radius parameter in the local covariance dimension is given as 3 - this can be improved to √ ǫτ 2 if one can show that there will always exists a point q ∈ B(x0 , r) ∩ M at which the function g : x ∈ M −→ x − µ attains a local extrema. [sent-232, score-0.432]
96 As we already mentioned, packing lemmas and size reduction guarantees for arbitrary factors are typically used in applications for nearest neighbor searching and clustering. [sent-234, score-0.205]
97 Existing data structures such as BBD Trees remedy this by alternating space partitioning splits with data partitioning splits. [sent-237, score-0.171]
98 Thus every alternate split is forced to send at most a constant fraction of the points into any of the children thus ensuring a depth that is logarithmic in the number of data points. [sent-238, score-0.186]
99 However it remains to be seen if the same trick can be used to bound the depth of RPT REE -M AX while maintaining the packing guarantees because although such “space partitioning” splits do not seem to hinder Theorem 5, they do hinder Theorem 10 (more specifically they hinder Theorem 11). [sent-240, score-0.356]
100 Space Partitioning Guarantee : assured size reduction by factor s in (d log s)O(1) levels Acknowledgments The authors thank James Lee for pointing out an incorrect usage of the term Assouad dimension in a previous version of the paper. [sent-244, score-0.254]
wordName wordTfidf (topN-words)
[('rpt', 0.599), ('ree', 0.483), ('radius', 0.281), ('ax', 0.24), ('balls', 0.201), ('cell', 0.195), ('packing', 0.118), ('ean', 0.11), ('ball', 0.098), ('manifold', 0.097), ('descendants', 0.096), ('doubling', 0.093), ('split', 0.089), ('levels', 0.081), ('cells', 0.071), ('dimension', 0.07), ('sd', 0.07), ('descendant', 0.059), ('adapts', 0.059), ('go', 0.053), ('partitioning', 0.051), ('rd', 0.05), ('inscribed', 0.049), ('aspect', 0.049), ('pl', 0.046), ('bad', 0.045), ('lemma', 0.045), ('manifolds', 0.044), ('centers', 0.04), ('depth', 0.04), ('splits', 0.04), ('trees', 0.039), ('intersect', 0.039), ('bound', 0.039), ('purushottam', 0.037), ('intrinsic', 0.035), ('prove', 0.035), ('covariance', 0.035), ('log', 0.033), ('hinder', 0.032), ('tree', 0.032), ('randomization', 0.031), ('theorem', 0.029), ('reduction', 0.029), ('structures', 0.029), ('ratio', 0.028), ('dimensionality', 0.027), ('tangent', 0.026), ('supplementary', 0.026), ('useless', 0.025), ('contained', 0.025), ('inside', 0.024), ('atness', 0.024), ('bbd', 0.024), ('imbedded', 0.024), ('kanpur', 0.024), ('kar', 0.024), ('lands', 0.024), ('ruth', 0.024), ('material', 0.024), ('bounded', 0.023), ('guarantees', 0.023), ('local', 0.023), ('factor', 0.023), ('riemannian', 0.022), ('piotr', 0.021), ('angela', 0.021), ('pair', 0.021), ('smallest', 0.021), ('variously', 0.02), ('netanyahu', 0.02), ('silverman', 0.02), ('expects', 0.02), ('dasgupta', 0.02), ('logarithmic', 0.02), ('disjoint', 0.02), ('completely', 0.019), ('notion', 0.019), ('curse', 0.019), ('guarantee', 0.019), ('fraction', 0.019), ('lie', 0.019), ('iit', 0.018), ('mount', 0.018), ('indyk', 0.018), ('children', 0.018), ('neighbor', 0.018), ('assured', 0.018), ('nearest', 0.017), ('radii', 0.017), ('yoav', 0.017), ('sanjoy', 0.017), ('child', 0.017), ('dence', 0.017), ('bi', 0.016), ('probability', 0.016), ('neutral', 0.016), ('sends', 0.016), ('tp', 0.015), ('projection', 0.015)]
simIndex simValue paperId paperTitle
same-paper 1 0.99999976 220 nips-2010-Random Projection Trees Revisited
Author: Aman Dhesi, Purushottam Kar
Abstract: The Random Projection Tree (RPT REE) structures proposed in [1] are space partitioning data structures that automatically adapt to various notions of intrinsic dimensionality of data. We prove new results for both the RPT REE -M AX and the RPT REE -M EAN data structures. Our result for RPT REE -M AX gives a nearoptimal bound on the number of levels required by this data structure to reduce the size of its cells by a factor s ≥ 2. We also prove a packing lemma for this data structure. Our final result shows that low-dimensional manifolds have bounded Local Covariance Dimension. As a consequence we show that RPT REE -M EAN adapts to manifold dimension as well.
2 0.17667313 232 nips-2010-Sample Complexity of Testing the Manifold Hypothesis
Author: Hariharan Narayanan, Sanjoy Mitter
Abstract: The hypothesis that high dimensional data tends to lie in the vicinity of a low dimensional manifold is the basis of a collection of methodologies termed Manifold Learning. In this paper, we study statistical aspects of the question of fitting a manifold with a nearly optimal least squared error. Given upper bounds on the dimension, volume, and curvature, we show that Empirical Risk Minimization can produce a nearly optimal manifold using a number of random samples that is independent of the ambient dimension of the space in which data lie. We obtain an upper bound on the required number of samples that depends polynomially on the curvature, exponentially on the intrinsic dimension, and linearly on the intrinsic volume. For constant error, we prove a matching minimax lower bound on the sample complexity that shows that this dependence on intrinsic dimension, volume log 1 and curvature is unavoidable. Whether the known lower bound of O( k + 2 δ ) 2 for the sample complexity of Empirical Risk minimization on k−means applied to data in a unit ball of arbitrary dimension is tight, has been an open question since 1997 [3]. Here is the desired bound on the error and δ is a bound on the probability of failure. We improve the best currently known upper bound [14] of 2 log 1 log4 k log 1 O( k2 + 2 δ ) to O k min k, 2 + 2 δ . Based on these results, we 2 devise a simple algorithm for k−means and another that uses a family of convex programs to fit a piecewise linear curve of a specified length to high dimensional data, where the sample complexity is independent of the ambient dimension. 1
3 0.073688582 223 nips-2010-Rates of convergence for the cluster tree
Author: Kamalika Chaudhuri, Sanjoy Dasgupta
Abstract: For a density f on Rd , a high-density cluster is any connected component of {x : f (x) ≥ λ}, for some λ > 0. The set of all high-density clusters form a hierarchy called the cluster tree of f . We present a procedure for estimating the cluster tree given samples from f . We give finite-sample convergence rates for our algorithm, as well as lower bounds on the sample complexity of this estimation problem. 1
4 0.062041838 146 nips-2010-Learning Multiple Tasks using Manifold Regularization
Author: Arvind Agarwal, Samuel Gerber, Hal Daume
Abstract: We present a novel method for multitask learning (MTL) based on manifold regularization: assume that all task parameters lie on a manifold. This is the generalization of a common assumption made in the existing literature: task parameters share a common linear subspace. One proposed method uses the projection distance from the manifold to regularize the task parameters. The manifold structure and the task parameters are learned using an alternating optimization framework. When the manifold structure is fixed, our method decomposes across tasks which can be learnt independently. An approximation of the manifold regularization scheme is presented that preserves the convexity of the single task learning problem, and makes the proposed MTL framework efficient and easy to implement. We show the efficacy of our method on several datasets. 1
5 0.057035044 270 nips-2010-Tight Sample Complexity of Large-Margin Learning
Author: Sivan Sabato, Nathan Srebro, Naftali Tishby
Abstract: We obtain a tight distribution-specific characterization of the sample complexity of large-margin classification with L2 regularization: We introduce the γ-adapted-dimension, which is a simple function of the spectrum of a distribution’s covariance matrix, and show distribution-specific upper and lower bounds on the sample complexity, both governed by the γ-adapted-dimension of the source distribution. We conclude that this new quantity tightly characterizes the true sample complexity of large-margin classification. The bounds hold for a rich family of sub-Gaussian distributions. 1
6 0.056771401 145 nips-2010-Learning Kernels with Radiuses of Minimum Enclosing Balls
7 0.052995265 21 nips-2010-Accounting for network effects in neuronal responses using L1 regularized point process models
8 0.049339816 195 nips-2010-Online Learning in The Manifold of Low-Rank Matrices
9 0.04532316 193 nips-2010-Online Learning: Random Averages, Combinatorial Parameters, and Learnability
10 0.044127185 49 nips-2010-Computing Marginal Distributions over Continuous Markov Networks for Statistical Relational Learning
11 0.04281918 149 nips-2010-Learning To Count Objects in Images
12 0.040468752 135 nips-2010-Label Embedding Trees for Large Multi-Class Tasks
13 0.04044006 257 nips-2010-Structured Determinantal Point Processes
14 0.039534267 178 nips-2010-Multivariate Dyadic Regression Trees for Sparse Learning Problems
15 0.036500368 279 nips-2010-Universal Kernels on Non-Standard Input Spaces
16 0.036462776 276 nips-2010-Tree-Structured Stick Breaking for Hierarchical Data
17 0.035601556 104 nips-2010-Generative Local Metric Learning for Nearest Neighbor Classification
18 0.034231085 92 nips-2010-Fast global convergence rates of gradient methods for high-dimensional statistical recovery
19 0.0341179 261 nips-2010-Supervised Clustering
20 0.033949599 265 nips-2010-The LASSO risk: asymptotic results and real world examples
topicId topicWeight
[(0, 0.101), (1, 0.02), (2, 0.051), (3, 0.023), (4, 0.017), (5, 0.022), (6, 0.012), (7, -0.015), (8, 0.005), (9, 0.055), (10, 0.027), (11, -0.064), (12, -0.077), (13, -0.044), (14, 0.034), (15, -0.013), (16, 0.08), (17, -0.098), (18, -0.017), (19, -0.01), (20, -0.025), (21, -0.088), (22, 0.048), (23, -0.036), (24, -0.021), (25, -0.013), (26, -0.134), (27, -0.007), (28, -0.032), (29, -0.047), (30, -0.0), (31, -0.02), (32, 0.091), (33, -0.034), (34, -0.012), (35, -0.104), (36, 0.098), (37, -0.029), (38, -0.025), (39, -0.004), (40, 0.077), (41, -0.003), (42, -0.05), (43, -0.003), (44, -0.059), (45, -0.116), (46, 0.03), (47, -0.02), (48, -0.029), (49, -0.04)]
simIndex simValue paperId paperTitle
same-paper 1 0.9321354 220 nips-2010-Random Projection Trees Revisited
Author: Aman Dhesi, Purushottam Kar
Abstract: The Random Projection Tree (RPT REE) structures proposed in [1] are space partitioning data structures that automatically adapt to various notions of intrinsic dimensionality of data. We prove new results for both the RPT REE -M AX and the RPT REE -M EAN data structures. Our result for RPT REE -M AX gives a nearoptimal bound on the number of levels required by this data structure to reduce the size of its cells by a factor s ≥ 2. We also prove a packing lemma for this data structure. Our final result shows that low-dimensional manifolds have bounded Local Covariance Dimension. As a consequence we show that RPT REE -M EAN adapts to manifold dimension as well.
2 0.74408627 232 nips-2010-Sample Complexity of Testing the Manifold Hypothesis
Author: Hariharan Narayanan, Sanjoy Mitter
Abstract: The hypothesis that high dimensional data tends to lie in the vicinity of a low dimensional manifold is the basis of a collection of methodologies termed Manifold Learning. In this paper, we study statistical aspects of the question of fitting a manifold with a nearly optimal least squared error. Given upper bounds on the dimension, volume, and curvature, we show that Empirical Risk Minimization can produce a nearly optimal manifold using a number of random samples that is independent of the ambient dimension of the space in which data lie. We obtain an upper bound on the required number of samples that depends polynomially on the curvature, exponentially on the intrinsic dimension, and linearly on the intrinsic volume. For constant error, we prove a matching minimax lower bound on the sample complexity that shows that this dependence on intrinsic dimension, volume log 1 and curvature is unavoidable. Whether the known lower bound of O( k + 2 δ ) 2 for the sample complexity of Empirical Risk minimization on k−means applied to data in a unit ball of arbitrary dimension is tight, has been an open question since 1997 [3]. Here is the desired bound on the error and δ is a bound on the probability of failure. We improve the best currently known upper bound [14] of 2 log 1 log4 k log 1 O( k2 + 2 δ ) to O k min k, 2 + 2 δ . Based on these results, we 2 devise a simple algorithm for k−means and another that uses a family of convex programs to fit a piecewise linear curve of a specified length to high dimensional data, where the sample complexity is independent of the ambient dimension. 1
3 0.45000401 178 nips-2010-Multivariate Dyadic Regression Trees for Sparse Learning Problems
Author: Han Liu, Xi Chen
Abstract: We propose a new nonparametric learning method based on multivariate dyadic regression trees (MDRTs). Unlike traditional dyadic decision trees (DDTs) or classification and regression trees (CARTs), MDRTs are constructed using penalized empirical risk minimization with a novel sparsity-inducing penalty. Theoretically, we show that MDRTs can simultaneously adapt to the unknown sparsity and smoothness of the true regression functions, and achieve the nearly optimal rates of convergence (in a minimax sense) for the class of (α, C)-smooth functions. Empirically, MDRTs can simultaneously conduct function estimation and variable selection in high dimensions. To make MDRTs applicable for large-scale learning problems, we propose a greedy heuristics. The superior performance of MDRTs are demonstrated on both synthetic and real datasets. 1
4 0.44838142 205 nips-2010-Permutation Complexity Bound on Out-Sample Error
Author: Malik Magdon-Ismail
Abstract: We define a data dependent permutation complexity for a hypothesis set H, which is similar to a Rademacher complexity or maximum discrepancy. The permutation complexity is based (like the maximum discrepancy) on dependent sampling. We prove a uniform bound on the generalization error, as well as a concentration result which means that the permutation estimate can be efficiently estimated.
5 0.44459024 270 nips-2010-Tight Sample Complexity of Large-Margin Learning
Author: Sivan Sabato, Nathan Srebro, Naftali Tishby
Abstract: We obtain a tight distribution-specific characterization of the sample complexity of large-margin classification with L2 regularization: We introduce the γ-adapted-dimension, which is a simple function of the spectrum of a distribution’s covariance matrix, and show distribution-specific upper and lower bounds on the sample complexity, both governed by the γ-adapted-dimension of the source distribution. We conclude that this new quantity tightly characterizes the true sample complexity of large-margin classification. The bounds hold for a rich family of sub-Gaussian distributions. 1
6 0.43047953 114 nips-2010-Humans Learn Using Manifolds, Reluctantly
7 0.42742163 223 nips-2010-Rates of convergence for the cluster tree
8 0.42360508 193 nips-2010-Online Learning: Random Averages, Combinatorial Parameters, and Learnability
9 0.41320634 146 nips-2010-Learning Multiple Tasks using Manifold Regularization
10 0.37790382 195 nips-2010-Online Learning in The Manifold of Low-Rank Matrices
11 0.36362872 31 nips-2010-An analysis on negative curvature induced by singularity in multi-layer neural-network learning
12 0.34506345 222 nips-2010-Random Walk Approach to Regret Minimization
13 0.33486834 233 nips-2010-Scrambled Objects for Least-Squares Regression
14 0.33447143 9 nips-2010-A New Probabilistic Model for Rank Aggregation
15 0.33235097 243 nips-2010-Smoothness, Low Noise and Fast Rates
16 0.32992053 221 nips-2010-Random Projections for $k$-means Clustering
17 0.31821832 134 nips-2010-LSTD with Random Projections
18 0.31784922 81 nips-2010-Evaluating neuronal codes for inference using Fisher information
19 0.3145498 170 nips-2010-Moreau-Yosida Regularization for Grouped Tree Structure Learning
20 0.30885628 58 nips-2010-Decomposing Isotonic Regression for Efficiently Solving Large Problems
topicId topicWeight
[(10, 0.266), (13, 0.034), (17, 0.01), (27, 0.05), (30, 0.171), (35, 0.012), (45, 0.126), (50, 0.037), (52, 0.023), (60, 0.056), (77, 0.039), (78, 0.011), (90, 0.055)]
simIndex simValue paperId paperTitle
same-paper 1 0.77660555 220 nips-2010-Random Projection Trees Revisited
Author: Aman Dhesi, Purushottam Kar
Abstract: The Random Projection Tree (RPT REE) structures proposed in [1] are space partitioning data structures that automatically adapt to various notions of intrinsic dimensionality of data. We prove new results for both the RPT REE -M AX and the RPT REE -M EAN data structures. Our result for RPT REE -M AX gives a nearoptimal bound on the number of levels required by this data structure to reduce the size of its cells by a factor s ≥ 2. We also prove a packing lemma for this data structure. Our final result shows that low-dimensional manifolds have bounded Local Covariance Dimension. As a consequence we show that RPT REE -M EAN adapts to manifold dimension as well.
2 0.63211352 270 nips-2010-Tight Sample Complexity of Large-Margin Learning
Author: Sivan Sabato, Nathan Srebro, Naftali Tishby
Abstract: We obtain a tight distribution-specific characterization of the sample complexity of large-margin classification with L2 regularization: We introduce the γ-adapted-dimension, which is a simple function of the spectrum of a distribution’s covariance matrix, and show distribution-specific upper and lower bounds on the sample complexity, both governed by the γ-adapted-dimension of the source distribution. We conclude that this new quantity tightly characterizes the true sample complexity of large-margin classification. The bounds hold for a rich family of sub-Gaussian distributions. 1
3 0.63006312 232 nips-2010-Sample Complexity of Testing the Manifold Hypothesis
Author: Hariharan Narayanan, Sanjoy Mitter
Abstract: The hypothesis that high dimensional data tends to lie in the vicinity of a low dimensional manifold is the basis of a collection of methodologies termed Manifold Learning. In this paper, we study statistical aspects of the question of fitting a manifold with a nearly optimal least squared error. Given upper bounds on the dimension, volume, and curvature, we show that Empirical Risk Minimization can produce a nearly optimal manifold using a number of random samples that is independent of the ambient dimension of the space in which data lie. We obtain an upper bound on the required number of samples that depends polynomially on the curvature, exponentially on the intrinsic dimension, and linearly on the intrinsic volume. For constant error, we prove a matching minimax lower bound on the sample complexity that shows that this dependence on intrinsic dimension, volume log 1 and curvature is unavoidable. Whether the known lower bound of O( k + 2 δ ) 2 for the sample complexity of Empirical Risk minimization on k−means applied to data in a unit ball of arbitrary dimension is tight, has been an open question since 1997 [3]. Here is the desired bound on the error and δ is a bound on the probability of failure. We improve the best currently known upper bound [14] of 2 log 1 log4 k log 1 O( k2 + 2 δ ) to O k min k, 2 + 2 δ . Based on these results, we 2 devise a simple algorithm for k−means and another that uses a family of convex programs to fit a piecewise linear curve of a specified length to high dimensional data, where the sample complexity is independent of the ambient dimension. 1
4 0.62958628 264 nips-2010-Synergies in learning words and their referents
Author: Mark Johnson, Katherine Demuth, Bevan Jones, Michael J. Black
Abstract: This paper presents Bayesian non-parametric models that simultaneously learn to segment words from phoneme strings and learn the referents of some of those words, and shows that there is a synergistic interaction in the acquisition of these two kinds of linguistic information. The models themselves are novel kinds of Adaptor Grammars that are an extension of an embedding of topic models into PCFGs. These models simultaneously segment phoneme sequences into words and learn the relationship between non-linguistic objects to the words that refer to them. We show (i) that modelling inter-word dependencies not only improves the accuracy of the word segmentation but also of word-object relationships, and (ii) that a model that simultaneously learns word-object relationships and word segmentation segments more accurately than one that just learns word segmentation on its own. We argue that these results support an interactive view of language acquisition that can take advantage of synergies such as these. 1
5 0.62954885 58 nips-2010-Decomposing Isotonic Regression for Efficiently Solving Large Problems
Author: Ronny Luss, Saharon Rosset, Moni Shahar
Abstract: A new algorithm for isotonic regression is presented based on recursively partitioning the solution space. We develop efficient methods for each partitioning subproblem through an equivalent representation as a network flow problem, and prove that this sequence of partitions converges to the global solution. These network flow problems can further be decomposed in order to solve very large problems. Success of isotonic regression in prediction and our algorithm’s favorable computational properties are demonstrated through simulated examples as large as 2 × 105 variables and 107 constraints.
6 0.62322652 283 nips-2010-Variational Inference over Combinatorial Spaces
7 0.61488467 40 nips-2010-Beyond Actions: Discriminative Models for Contextual Group Activities
8 0.59529215 222 nips-2010-Random Walk Approach to Regret Minimization
9 0.58630097 193 nips-2010-Online Learning: Random Averages, Combinatorial Parameters, and Learnability
10 0.58169824 75 nips-2010-Empirical Risk Minimization with Approximations of Probabilistic Grammars
11 0.57853639 260 nips-2010-Sufficient Conditions for Generating Group Level Sparsity in a Robust Minimax Framework
12 0.56938165 88 nips-2010-Extensions of Generalized Binary Search to Group Identification and Exponential Costs
13 0.56266415 155 nips-2010-Learning the context of a category
14 0.56000978 173 nips-2010-Multi-View Active Learning in the Non-Realizable Case
15 0.55980629 117 nips-2010-Identifying graph-structured activation patterns in networks
16 0.55750221 285 nips-2010-Why are some word orders more common than others? A uniform information density account
17 0.55379504 280 nips-2010-Unsupervised Kernel Dimension Reduction
18 0.55347699 163 nips-2010-Lower Bounds on Rate of Convergence of Cutting Plane Methods
19 0.55307478 243 nips-2010-Smoothness, Low Noise and Fast Rates
20 0.55138272 80 nips-2010-Estimation of Renyi Entropy and Mutual Information Based on Generalized Nearest-Neighbor Graphs