nips nips2007 nips2007-116 knowledge-graph by maker-knowledge-mining

116 nips-2007-Learning the structure of manifolds using random projections

Source: pdf

Author: Yoav Freund, Sanjoy Dasgupta, Mayank Kabra, Nakul Verma

Abstract: We present a simple variant of the k-d tree which automatically adapts to intrinsic low dimensional structure in data. 1

Reference: text

Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 In such cases data of low intrinsic dimension is embedded in a space of high extrinsic dimension. [sent-4, score-0.413]

2 Some of this work (for instance, [2]) exploits the low intrinsic dimension to improve the convergence rate of supervised learning algorithms. [sent-13, score-0.307]

3 In this paper, we describe a new way of modeling data that resides in RD but has lower intrinsic dimension d < D. [sent-15, score-0.299]

4 We call this spatial data structure a random projection tree (RP tree). [sent-18, score-0.535]

5 It can be thought of as a variant of the k-d tree that is provably manifoldadaptive. [sent-19, score-0.346]

6 k-d trees, RP trees, and vector quantization Recall that a k-d tree [3] partitions RD into hyperrectangular cells. [sent-20, score-0.419]

7 It is built in a recursive manner, splitting along one coordinate direction at a time. [sent-21, score-0.277]

8 The succession of splits corresponds to a binary tree whose leaves contain the individual cells in RD . [sent-22, score-0.563]

9 These trees are among the most widely-used methods for spatial partitioning in machine learning and computer vision. [sent-23, score-0.336]

10 1 Figure 1: Left: A spatial partitioning of R2 induced by a k-d tree with three levels. [sent-28, score-0.425]

11 On the left part of Figure 1 we illustrate a k-d tree for a set of vectors in R2 . [sent-31, score-0.367]

12 The leaves of the tree partition RD into cells; given a query point q, the cell containing q is identiﬁed by traversing down the k-d tree. [sent-32, score-0.546]

13 Each cell can be thought of as having a representative vector: its mean, depicted in the ﬁgure by a circle. [sent-33, score-0.222]

14 The partitioning together with these mean vectors deﬁne a vector quantization (VQ) of R2 : a mapping from R2 to a ﬁnite set of representative vectors (called a “codebook” in the context of lossy compression methods). [sent-34, score-0.238]

15 This error is closely related (in fact, proportional) to the average diameter of cells, that is, the average squared distance between pairs of points in a cell. [sent-38, score-0.286]

16 1 As the depth of the k-d tree increases the diameter of the cells decreases and so does the VQ error. [sent-39, score-0.676]

17 However, in high dimension, the rate of decrease of the average diameter can be very slow. [sent-40, score-0.264]

18 In fact, as we show in the supplementary material, there are data sets in RD for which a k-d tree requires D levels in order to halve the diameter. [sent-41, score-0.468]

19 This slow rate of decrease of cell diameter is ﬁne if D = 2 as in Figure 1, but it is disastrous if D = 1000. [sent-42, score-0.454]

20 Constructing 1000 levels of the tree requires 21000 data points! [sent-43, score-0.426]

21 This problem is a real one that has been observed empirically: k-d trees are prone to a curse of dimensionality. [sent-44, score-0.261]

22 In general, k-d trees will not be able to beneﬁt from this; in fact the bad example mentioned above has intrinsic dimension d = 1. [sent-46, score-0.5]

23 But we show that a simple variant of the k-d tree does indeed decrease cell diameters much more quickly. [sent-47, score-0.589]

24 Instead of splitting along coordinate directions, we use randomly chosen unit vectors, and instead of splitting data exactly at the median, we use a more carefully chosen split point. [sent-48, score-0.489]

25 We call the resulting data structure a random projection tree (Figure 1, right) and we show that it admits the following theoretical guarantee (formal statement is in the next section). [sent-49, score-0.477]

26 Pick any cell C in the RP tree, and suppose the data in C have intrinsic dimension d. [sent-50, score-0.489]

27 Pick a descendant cell ≥ d levels below; then with constant probability, this descendant has average diameter at most half that of C. [sent-51, score-0.565]

28 We thus have a vector quantization construction method for which the diameter of the cells depends on the intrinsic dimension, rather than the extrinsic dimension of the data. [sent-53, score-0.729]

29 A large part of the beneﬁt of RP trees comes from the use of random unit directions, which is rather like running k-d trees with a preprocessing step in which the data are projected into a random 1 2 This is in contrast to the max diameter, the maximum distance between two vectors in a cell. [sent-54, score-0.7]

30 In fact, a recent experimental study of nearest neighbor algorithms [8] observes that a similar pre-processing step improves the performance of nearest neighbor schemes based on spatial data structures. [sent-57, score-0.418]

31 The explanation we provide is based on the assumption that the data has low intrinsic dimension. [sent-59, score-0.24]

32 Another spatial data structure based on random projections is the locality sensitive hashing scheme [6]. [sent-60, score-0.292]

33 Manifold learning and near neighbor search The fast rate of diameter decrease in random projection trees has many consequences beyond the quality of vector quantization. [sent-61, score-0.744]

34 In particular, the statistical theory of tree-based statistical estimators — whether used for classiﬁcation or regression — is centered around the rate of diameter decrease; for details, see for instance Chapter 20 of [7]. [sent-62, score-0.251]

35 Thus RP trees generically exhibit faster convergence in all these contexts. [sent-63, score-0.231]

36 If the diameter of cells is small, then it is reasonable to classify a query point according to the majority label in its cell. [sent-65, score-0.351]

37 The classical work of Cover and Hart [5] on the Bayes risk of nearest neighbor methods applies equally to the majority vote in a small enough cell. [sent-67, score-0.197]

38 The left ﬁgure depicts data concentrated near a one-dimensional manifold. [sent-70, score-0.155]

39 Our goal is to partition data into small diameter regions so that the data in each region is well-approximated by its mean+PCA. [sent-72, score-0.307]

40 Some of the data lies close to a one-dimensional manifold, some of the data spans two dimensions, and some of the data (represented by the red dot) is concentrated around a single point (a zero-dimensional manifold). [sent-74, score-0.159]

41 In the literature, the most common way to capture this manifold structure is to create a graph in which nodes represent data points and edges connect pairs of nearby points. [sent-76, score-0.155]

42 Once these individual cells are small enough, the data in them can be well-approximated by an afﬁne subspace, for instance that given by principal component analysis. [sent-80, score-0.271]

43 1 The RP tree algorithm Spatial data structures In what follows, we assume the data lie in RD , and we consider spatial data structures built by recursive binary splits. [sent-83, score-0.54]

44 procedure C HOOSE RULE(S) comment: PCA tree version let u be the principal eigenvector of the covariance of S Rule(x) := x · u ≤ median({z · u : z ∈ S}) return (Rule) This method will do a good job of adapting to low intrinsic dimension (details omitted). [sent-87, score-0.901]

45 First, estimating the principal eigenvector requires a signiﬁcant amount of data; recall that only about 1/2k fraction of the data winds up at a cell at level k of the tree. [sent-89, score-0.439]

46 Second, when the extrinsic dimension is high, the amount of memory and computation required to compute the dot product between the data vectors and the eigenvectors becomes the dominant part of the computation. [sent-90, score-0.282]

47 As each node in the tree is likely to have a different eigenvector this severely limits the feasible tree depth. [sent-91, score-0.749]

48 We now show that using random projections overcomes these problems while maintaining the adaptivity to low intrinsic dimension. [sent-92, score-0.316]

49 2 Random projection trees We shall see that the key beneﬁts of PCA-based splits can be realized much more simply, by picking random directions. [sent-94, score-0.493]

50 PCA will of course correctly identify this subspace, and a split along the principal eigenvector u will do a good job of reducing the diameter of the data. [sent-96, score-0.622]

51 But a random direction v will also have some component in the direction of u, and splitting along the median of v will not be all that different from splitting along u. [sent-97, score-0.479]

52 Figure 3: Intuition: a random direction is almost as good as the principal eigenvector. [sent-98, score-0.174]

53 Also, we can use the same random projection in different places in the tree; all we need is to choose a large enough set of projections that, with high probability, there is be a good projection direction for each node in the tree. [sent-100, score-0.395]

54 In our experience setting the number of projections equal to the depth of the tree is sufﬁcient. [sent-101, score-0.434]

55 Thus, for a tree of depth k, we use only k projection vectors v, as opposed to 2k with a PCA tree. [sent-102, score-0.502]

56 When preparing data to train a tree we can compute the k projection values before building the tree. [sent-103, score-0.448]

57 For a cell containing points S, let ∆(S) be the diameter of S (the distance between the two furthest points in the set), and ∆A (S) the average diameter, that is, the 4 average distance between points of S: ∆2 (S) = A 1 |S|2 x−y 2 x,y∈S = 2 |S| x − mean(S) 2 . [sent-106, score-0.58]

58 x∈S 2 We use two different types of splits: if ∆ (S) is less than c∆2 (S) (for some constant c) then we A use the hyperplane split discussed above. [sent-107, score-0.183]

59 Otherwise, we split S into two groups based on distance from the mean. [sent-108, score-0.229]

60 procedure C HOOSE RULE(S) comment: RP tree version if ∆2 (S) ≤ c · ∆2 (S) A  choose a random unit direction v  sort projection values: a(x) = v · x ∀x ∈ S, generating the list a1 ≤ a2 ≤ · · · ≤ an    for i = 1, . [sent-109, score-0.499]

61 4, for instance, splitting the bottom cell at the median would lead to a messy partition, whereas the RP tree split produces two clean, connected clusters. [sent-115, score-0.877]

62 3: The two PCA ellipses corresponding to the two cells after the ﬁrst split. [sent-119, score-0.197]

63 5: The four PCA ellipses for the cells at the third level. [sent-121, score-0.197]

64 As the cells get smaller, their individual PCAs reveal 1D manifold structure. [sent-123, score-0.204]

65 Note: the ellipses are for comparison only; the RP tree algorithm does not look at them. [sent-124, score-0.409]

66 The second type of split, based on distance from the mean of the cell, is needed to deal with cases in which the cell contains data at very different scales. [sent-125, score-0.266]

67 If only splits by projection were allowed, then a large number of splits would be devoted to uselessly subdividing this point mass. [sent-127, score-0.368]

68 The second type of split separates it from the rest of the data in one go. [sent-128, score-0.213]

69 A large fraction of them might be “empty” background patches, in which case they’d fall near the center of the cell in a very tight cluster. [sent-130, score-0.251]

70 The effect of the split is then to separate out these two clusters. [sent-132, score-0.183]

71 3 Theoretical foundations In analyzing RP trees, we consider a statistical notion of dimension: we say set S has local covariance dimension (d, ) if (1 − ) fraction of the variance is concentrated in a d-dimensional subspace. [sent-134, score-0.237]

72 Deﬁnition 1 S ⊂ RD has local covariance dimension (d, ) if the largest d eigenvalues of its 2 2 2 2 2 2 covariance matrix satisfy σ1 + · · · + σd ≥ (1 − ) · (σ1 + · · · + σD ). [sent-136, score-0.206]

73 ) Now, suppose an RP tree is built from a data set X ⊂ RD , not necessarily ﬁnite. [sent-138, score-0.395]

74 Recall that there are two different types of splits; let’s call them splits by distance and splits by projection. [sent-139, score-0.316]

75 Suppose an RP tree is built using data set X ⊂ RD . [sent-141, score-0.395]

76 Consider any cell C for which X ∩ C has local covariance dimension (d, ), where < c1 . [sent-142, score-0.327]

77 Pick a point x ∈ S ∩ C at random, and let C be the cell that contains it at the next level down. [sent-143, score-0.19]

78 • If C is split by distance then E [∆(S ∩ C )] ≤ c2 ∆(S ∩ C). [sent-144, score-0.229]

79 • If C is split by projection, then E ∆2 (S ∩ C ) ≤ 1 − A c3 ∆2 (S ∩ C). [sent-145, score-0.183]

80 As a consequence, the expected average diameter of cells is halved every O(d) levels. [sent-147, score-0.319]

81 First of all, both splits operate on the projected data; for the second type of split (split by distance), data that fall in an interval around the median are separated from data outside that interval. [sent-151, score-0.523]

82 Second, the tree is built in a streaming manner: that is, the data arrive one at a time, and are processed (to update the tree) and immediately discarded. [sent-152, score-0.439]

83 This is managed by maintaining simple statistics at each internal node of the tree and updating them appropriately as the data streams by (more details in the supplementary matter). [sent-153, score-0.433]

84 Finally, instead of choosing a new random projection in each cell, a dictionary of a few random projections is chosen at the outset. [sent-155, score-0.233]

85 We will see that RP trees adapt well to such cases. [sent-160, score-0.231]

86 We compare the performance of different trees according to the average VQ error they incur at various levels. [sent-172, score-0.231]

87 We consider four types of trees: (1) k-d trees in which the coordinate for a split is chosen at random; (2) k-d trees in which at each split, the best coordinate is chosen (the one that most improves VQ error); (3) RP trees; and (4) for reference, PCA trees. [sent-173, score-0.747]

88 In both cases, RP trees outperform both k-d tree variants and are close to the performance of PCA trees without having to explicitly compute any principal components. [sent-175, score-0.875]

89 3 MNIST dataset We next demonstrate RP trees on the all-familiar MNIST dataset of handwritten digits. [sent-177, score-0.321]

90 This dataset consists of 28 × 28 grayscale images of the digits zero through nine, and is believed to have low intrinsic dimension (for instance, see [10]). [sent-178, score-0.381]

91 Figure 6 (top) shows the ﬁrst few levels of the RP tree for the images of digit 1. [sent-180, score-0.464]

92 The bar underneath each node shows the fraction of points going to the left and to the right, to give a sense of how balanced each split is. [sent-183, score-0.284]

93 Alongside each mean, we also show a histogram of the 20 largest eigenvalues of the covariance matrix, which reveal how closely the data in the cell is concentrated near a low-dimensional subspace. [sent-184, score-0.388]

94 Hence, very quickly, the cell means become good representatives of the dataset: an experimental corroboration that RP trees adapt to the low intrinsic dimension of the data. [sent-188, score-0.728]

95 This is also brought out in Figure 6 (bottom), where the images are shown projected onto the plane deﬁned by their top two principal components. [sent-189, score-0.179]

96 ) The left image shows how the data was split at the topmost level (dark versus light). [sent-191, score-0.257]

97 Observe that this random cut is actually quite close to what the PCA split would have been, corroborating our earlier intuition (recall Figure 3). [sent-192, score-0.212]

98 7 Figure 6: Top: Three levels of the RP tree for MNIST digit 1. [sent-194, score-0.435]

99 Colors represent different cells in the RP tree, after just one split (left) or after two levels of the tree (right). [sent-196, score-0.687]

100 A fast nearest neighbor algorithm based on a principal axis search tree. [sent-245, score-0.258]

similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('rp', 0.526), ('tree', 0.32), ('trees', 0.231), ('diameter', 0.211), ('cell', 0.19), ('split', 0.183), ('intrinsic', 0.172), ('vq', 0.162), ('splits', 0.135), ('ree', 0.13), ('rd', 0.112), ('pca', 0.11), ('cells', 0.108), ('ake', 0.102), ('coord', 0.102), ('projection', 0.098), ('dimension', 0.097), ('manifold', 0.096), ('splitting', 0.096), ('principal', 0.093), ('neighbor', 0.092), ('hoose', 0.089), ('ellipses', 0.089), ('median', 0.088), ('projections', 0.077), ('levels', 0.076), ('extrinsic', 0.076), ('nearest', 0.073), ('rule', 0.073), ('concentrated', 0.069), ('eigenvector', 0.068), ('diego', 0.068), ('quantization', 0.065), ('spatial', 0.058), ('projected', 0.057), ('decrease', 0.053), ('aj', 0.052), ('direction', 0.052), ('dimensionality', 0.051), ('coordinate', 0.051), ('rightt', 0.051), ('partitioning', 0.047), ('vectors', 0.047), ('uc', 0.046), ('distance', 0.046), ('built', 0.045), ('dataset', 0.045), ('verma', 0.044), ('streaming', 0.044), ('topmost', 0.044), ('descendant', 0.044), ('san', 0.043), ('supplementary', 0.042), ('node', 0.041), ('comment', 0.04), ('hashing', 0.04), ('lef', 0.04), ('covariance', 0.04), ('instance', 0.04), ('mnist', 0.04), ('return', 0.039), ('digit', 0.039), ('low', 0.038), ('depth', 0.037), ('partition', 0.036), ('manner', 0.036), ('job', 0.034), ('avg', 0.034), ('thing', 0.034), ('tt', 0.034), ('partitions', 0.034), ('synthetic', 0.033), ('along', 0.033), ('dasgupta', 0.032), ('eigenvectors', 0.032), ('representative', 0.032), ('subspace', 0.032), ('majority', 0.032), ('locality', 0.031), ('pick', 0.031), ('fraction', 0.031), ('data', 0.03), ('curse', 0.03), ('colors', 0.03), ('belkin', 0.03), ('manifolds', 0.03), ('near', 0.03), ('eigenvalues', 0.029), ('images', 0.029), ('random', 0.029), ('points', 0.029), ('directions', 0.028), ('datasets', 0.028), ('sensitive', 0.027), ('recall', 0.027), ('lie', 0.027), ('variant', 0.026), ('depicts', 0.026), ('areas', 0.026), ('var', 0.025)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 1.0000004 116 nips-2007-Learning the structure of manifolds using random projections

Author: Yoav Freund, Sanjoy Dasgupta, Mayank Kabra, Nakul Verma

Abstract: We present a simple variant of the k-d tree which automatically adapts to intrinsic low dimensional structure in data. 1

2 0.17306174 16 nips-2007-A learning framework for nearest neighbor search

Author: Lawrence Cayton, Sanjoy Dasgupta

Abstract: Can we leverage learning techniques to build a fast nearest-neighbor (ANN) retrieval data structure? We present a general learning framework for the NN problem in which sample queries are used to learn the parameters of a data structure that minimize the retrieval time and/or the miss rate. We explore the potential of this novel framework through two popular NN data structures: KD-trees and the rectilinear structures employed by locality sensitive hashing. We derive a generalization theory for these data structure classes and present simple learning algorithms for both. Experimental results reveal that learning often improves on the already strong performance of these data structures. 1

3 0.17092717 161 nips-2007-Random Projections for Manifold Learning

Author: Chinmay Hegde, Michael Wakin, Richard Baraniuk

Abstract: We propose a novel method for linear dimensionality reduction of manifold modeled data. First, we show that with a small number M of random projections of sample points in RN belonging to an unknown K-dimensional Euclidean manifold, the intrinsic dimension (ID) of the sample set can be estimated to high accuracy. Second, we rigorously prove that using only this set of random projections, we can estimate the structure of the underlying manifold. In both cases, the number of random projections required is linear in K and logarithmic in N , meaning that K < M ≪ N . To handle practical situations, we develop a greedy algorithm to estimate the smallest size of the projection space required to perform manifold learning. Our method is particularly relevant in distributed sensing systems and leads to signiﬁcant potential savings in data acquisition, storage and transmission costs.

4 0.14403895 75 nips-2007-Efficient Bayesian Inference for Dynamically Changing Graphs

Author: Ozgur Sumer, Umut Acar, Alexander T. Ihler, Ramgopal R. Mettu

Abstract: Motivated by stochastic systems in which observed evidence and conditional dependencies between states of the network change over time, and certain quantities of interest (marginal distributions, likelihood estimates etc.) must be updated, we study the problem of adaptive inference in tree-structured Bayesian networks. We describe an algorithm for adaptive inference that handles a broad range of changes to the network and is able to maintain marginal distributions, MAP estimates, and data likelihoods in all expected logarithmic time. We give an implementation of our algorithm and provide experiments that show that the algorithm can yield up to two orders of magnitude speedups on answering queries and responding to dynamic changes over the sum-product algorithm. 1

5 0.11555367 107 nips-2007-Iterative Non-linear Dimensionality Reduction with Manifold Sculpting

Author: Michael Gashler, Dan Ventura, Tony Martinez

Abstract: Many algorithms have been recently developed for reducing dimensionality by projecting data onto an intrinsic non-linear manifold. Unfortunately, existing algorithms often lose signiﬁcant precision in this transformation. Manifold Sculpting is a new algorithm that iteratively reduces dimensionality by simulating surface tension in local neighborhoods. We present several experiments that show Manifold Sculpting yields more accurate results than existing algorithms with both generated and natural data-sets. Manifold Sculpting is also able to beneﬁt from both prior dimensionality reduction efforts. 1

6 0.11343576 78 nips-2007-Efficient Principled Learning of Thin Junction Trees

7 0.10604264 197 nips-2007-The Infinite Markov Model

8 0.099784374 27 nips-2007-Anytime Induction of Cost-sensitive Trees

9 0.098741151 20 nips-2007-Adaptive Embedded Subgraph Algorithms using Walk-Sum Analysis

10 0.086619459 188 nips-2007-Subspace-Based Face Recognition in Analog VLSI

11 0.084642 204 nips-2007-Theoretical Analysis of Heuristic Search Methods for Online POMDPs

12 0.083449259 126 nips-2007-McRank: Learning to Rank Using Multiple Classification and Gradient Boosting

13 0.083308786 10 nips-2007-A Randomized Algorithm for Large Scale Support Vector Learning

14 0.08089038 31 nips-2007-Bayesian Agglomerative Clustering with Coalescents

15 0.073264785 115 nips-2007-Learning the 2-D Topology of Images

16 0.069763243 160 nips-2007-Random Features for Large-Scale Kernel Machines

17 0.067083403 177 nips-2007-Simplified Rules and Theoretical Analysis for Information Bottleneck Optimization and PCA with Spiking Neurons

18 0.066432022 17 nips-2007-A neural network implementing optimal state estimation based on dynamic spike train decoding

19 0.064055651 186 nips-2007-Statistical Analysis of Semi-Supervised Regression

20 0.061895791 13 nips-2007-A Unified Near-Optimal Estimator For Dimension Reduction in $l \alpha$ ($0<\alpha\leq 2$) Using Stable Random Projections

similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, -0.205), (1, 0.042), (2, -0.041), (3, 0.005), (4, -0.038), (5, -0.034), (6, 0.018), (7, 0.127), (8, 0.03), (9, 0.073), (10, -0.138), (11, 0.328), (12, 0.108), (13, -0.046), (14, 0.068), (15, -0.101), (16, 0.002), (17, -0.045), (18, 0.018), (19, 0.001), (20, -0.002), (21, 0.067), (22, 0.028), (23, 0.072), (24, 0.229), (25, -0.128), (26, -0.101), (27, 0.059), (28, -0.042), (29, -0.032), (30, -0.135), (31, 0.003), (32, -0.083), (33, -0.117), (34, -0.14), (35, -0.028), (36, -0.1), (37, 0.018), (38, -0.085), (39, -0.081), (40, 0.069), (41, -0.044), (42, 0.079), (43, 0.021), (44, 0.041), (45, 0.031), (46, 0.007), (47, 0.065), (48, 0.071), (49, 0.001)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.97777396 116 nips-2007-Learning the structure of manifolds using random projections

Author: Yoav Freund, Sanjoy Dasgupta, Mayank Kabra, Nakul Verma

Abstract: We present a simple variant of the k-d tree which automatically adapts to intrinsic low dimensional structure in data. 1

2 0.77130222 16 nips-2007-A learning framework for nearest neighbor search

Author: Lawrence Cayton, Sanjoy Dasgupta

3 0.6936267 78 nips-2007-Efficient Principled Learning of Thin Junction Trees

Author: Anton Chechetka, Carlos Guestrin

Abstract: We present the ﬁrst truly polynomial algorithm for PAC-learning the structure of bounded-treewidth junction trees – an attractive subclass of probabilistic graphical models that permits both the compact representation of probability distributions and efﬁcient exact inference. For a constant treewidth, our algorithm has polynomial time and sample complexity. If a junction tree with sufﬁciently strong intraclique dependencies exists, we provide strong theoretical guarantees in terms of KL divergence of the result from the true distribution. We also present a lazy extension of our approach that leads to very signiﬁcant speed ups in practice, and demonstrate the viability of our method empirically, on several real world datasets. One of our key new theoretical insights is a method for bounding the conditional mutual information of arbitrarily large sets of variables with only polynomially many mutual information computations on ﬁxed-size subsets of variables, if the underlying distribution can be approximated by a bounded-treewidth junction tree. 1

4 0.67511874 27 nips-2007-Anytime Induction of Cost-sensitive Trees

Author: Saher Esmeir, Shaul Markovitch

Abstract: Machine learning techniques are increasingly being used to produce a wide-range of classiﬁers for complex real-world applications that involve nonuniform testing costs and misclassiﬁcation costs. As the complexity of these applications grows, the management of resources during the learning and classiﬁcation processes becomes a challenging task. In this work we introduce ACT (Anytime Cost-sensitive Trees), a novel framework for operating in such environments. ACT is an anytime algorithm that allows trading computation time for lower classiﬁcation costs. It builds a tree top-down and exploits additional time resources to obtain better estimations for the utility of the different candidate splits. Using sampling techniques ACT approximates for each candidate split the cost of the subtree under it and favors the one with a minimal cost. Due to its stochastic nature ACT is expected to be able to escape local minima, into which greedy methods may be trapped. Experiments with a variety of datasets were conducted to compare the performance of ACT to that of the state of the art cost-sensitive tree learners. The results show that for most domains ACT produces trees of signiﬁcantly lower costs. ACT is also shown to exhibit good anytime behavior with diminishing returns.

5 0.5510686 161 nips-2007-Random Projections for Manifold Learning

Author: Chinmay Hegde, Michael Wakin, Richard Baraniuk

6 0.50167167 20 nips-2007-Adaptive Embedded Subgraph Algorithms using Walk-Sum Analysis

7 0.48553792 31 nips-2007-Bayesian Agglomerative Clustering with Coalescents

8 0.48518732 75 nips-2007-Efficient Bayesian Inference for Dynamically Changing Graphs

9 0.46674162 197 nips-2007-The Infinite Markov Model

10 0.45474631 107 nips-2007-Iterative Non-linear Dimensionality Reduction with Manifold Sculpting

11 0.41276777 89 nips-2007-Feature Selection Methods for Improving Protein Structure Prediction with Rosetta

12 0.3996979 72 nips-2007-Discriminative Log-Linear Grammars with Latent Variables

13 0.38816839 188 nips-2007-Subspace-Based Face Recognition in Analog VLSI

14 0.3260304 115 nips-2007-Learning the 2-D Topology of Images

15 0.31786516 204 nips-2007-Theoretical Analysis of Heuristic Search Methods for Online POMDPs

16 0.3153401 126 nips-2007-McRank: Learning to Rank Using Multiple Classification and Gradient Boosting

17 0.30605641 119 nips-2007-Learning with Tree-Averaged Densities and Distributions

18 0.30335239 10 nips-2007-A Randomized Algorithm for Large Scale Support Vector Learning

19 0.30291066 207 nips-2007-Transfer Learning using Kolmogorov Complexity: Basic Theory and Empirical Evaluations

20 0.30253452 13 nips-2007-A Unified Near-Optimal Estimator For Dimension Reduction in $l \alpha$ ($0<\alpha\leq 2$) Using Stable Random Projections

similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(5, 0.054), (13, 0.035), (16, 0.037), (19, 0.033), (21, 0.058), (31, 0.025), (34, 0.036), (35, 0.067), (47, 0.105), (49, 0.013), (51, 0.184), (83, 0.176), (85, 0.031), (87, 0.018), (90, 0.05)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.85032439 116 nips-2007-Learning the structure of manifolds using random projections

Author: Yoav Freund, Sanjoy Dasgupta, Mayank Kabra, Nakul Verma

Abstract: We present a simple variant of the k-d tree which automatically adapts to intrinsic low dimensional structure in data. 1

2 0.77686751 63 nips-2007-Convex Relaxations of Latent Variable Training

Author: Yuhong Guo, Dale Schuurmans

Abstract: We investigate a new, convex relaxation of an expectation-maximization (EM) variant that approximates a standard objective while eliminating local minima. First, a cautionary result is presented, showing that any convex relaxation of EM over hidden variables must give trivial results if any dependence on the missing values is retained. Although this appears to be a strong negative outcome, we then demonstrate how the problem can be bypassed by using equivalence relations instead of value assignments over hidden variables. In particular, we develop new algorithms for estimating exponential conditional models that only require equivalence relation information over the variable values. This reformulation leads to an exact expression for EM variants in a wide range of problems. We then develop a semideﬁnite relaxation that yields global training by eliminating local minima. 1

3 0.77178526 180 nips-2007-Sparse Feature Learning for Deep Belief Networks

Author: Marc'aurelio Ranzato, Y-lan Boureau, Yann L. Cun

Abstract: Unsupervised learning algorithms aim to discover the structure hidden in the data, and to learn representations that are more suitable as input to a supervised machine than the raw input. Many unsupervised methods are based on reconstructing the input from the representation, while constraining the representation to have certain desirable properties (e.g. low dimension, sparsity, etc). Others are based on approximating density by stochastically reconstructing the input from the representation. We describe a novel and efﬁcient algorithm to learn sparse representations, and compare it theoretically and experimentally with a similar machine trained probabilistically, namely a Restricted Boltzmann Machine. We propose a simple criterion to compare and select different unsupervised machines based on the trade-off between the reconstruction error and the information content of the representation. We demonstrate this method by extracting features from a dataset of handwritten numerals, and from a dataset of natural image patches. We show that by stacking multiple levels of such machines and by training sequentially, high-order dependencies between the input observed variables can be captured. 1

4 0.77071589 138 nips-2007-Near-Maximum Entropy Models for Binary Neural Representations of Natural Images

Author: Matthias Bethge, Philipp Berens

Abstract: Maximum entropy analysis of binary variables provides an elegant way for studying the role of pairwise correlations in neural populations. Unfortunately, these approaches suffer from their poor scalability to high dimensions. In sensory coding, however, high-dimensional data is ubiquitous. Here, we introduce a new approach using a near-maximum entropy model, that makes this type of analysis feasible for very high-dimensional data—the model parameters can be derived in closed form and sampling is easy. Therefore, our NearMaxEnt approach can serve as a tool for testing predictions from a pairwise maximum entropy model not only for low-dimensional marginals, but also for high dimensional measurements of more than thousand units. We demonstrate its usefulness by studying natural images with dichotomized pixel intensities. Our results indicate that the statistics of such higher-dimensional measurements exhibit additional structure that are not predicted by pairwise correlations, despite the fact that pairwise correlations explain the lower-dimensional marginal statistics surprisingly well up to the limit of dimensionality where estimation of the full joint distribution is feasible. 1

5 0.77009523 115 nips-2007-Learning the 2-D Topology of Images

Author: Nicolas L. Roux, Yoshua Bengio, Pascal Lamblin, Marc Joliveau, Balázs Kégl

Abstract: We study the following question: is the two-dimensional structure of images a very strong prior or is it something that can be learned with a few examples of natural images? If someone gave us a learning task involving images for which the two-dimensional topology of pixels was not known, could we discover it automatically and exploit it? For example suppose that the pixels had been permuted in a ﬁxed but unknown way, could we recover the relative two-dimensional location of pixels on images? The surprising result presented here is that not only the answer is yes, but that about as few as a thousand images are enough to approximately recover the relative locations of about a thousand pixels. This is achieved using a manifold learning algorithm applied to pixels associated with a measure of distributional similarity between pixel intensities. We compare different topologyextraction approaches and show how having the two-dimensional topology can be exploited.

6 0.76827282 94 nips-2007-Gaussian Process Models for Link Analysis and Transfer Learning

7 0.76766664 65 nips-2007-DIFFRAC: a discriminative and flexible framework for clustering

8 0.76718426 134 nips-2007-Multi-Task Learning via Conic Programming

9 0.76617324 16 nips-2007-A learning framework for nearest neighbor search

10 0.76547873 177 nips-2007-Simplified Rules and Theoretical Analysis for Information Bottleneck Optimization and PCA with Spiking Neurons

11 0.76272762 186 nips-2007-Statistical Analysis of Semi-Supervised Regression

12 0.76243079 18 nips-2007-A probabilistic model for generating realistic lip movements from speech

13 0.76093531 24 nips-2007-An Analysis of Inference with the Universum

14 0.7598291 158 nips-2007-Probabilistic Matrix Factorization

15 0.75871122 45 nips-2007-Classification via Minimum Incremental Coding Length (MICL)

16 0.75705451 209 nips-2007-Ultrafast Monte Carlo for Statistical Summations

17 0.75587398 187 nips-2007-Structured Learning with Approximate Inference

18 0.75557047 49 nips-2007-Colored Maximum Variance Unfolding

19 0.75556463 76 nips-2007-Efficient Convex Relaxation for Transductive Support Vector Machine

20 0.75521523 148 nips-2007-Online Linear Regression and Its Application to Model-Based Reinforcement Learning