nips nips2007 nips2007-116 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Yoav Freund, Sanjoy Dasgupta, Mayank Kabra, Nakul Verma
Abstract: We present a simple variant of the k-d tree which automatically adapts to intrinsic low dimensional structure in data. 1
Reference: text
sentIndex sentText sentNum sentScore
1 In such cases data of low intrinsic dimension is embedded in a space of high extrinsic dimension. [sent-4, score-0.413]
2 Some of this work (for instance, [2]) exploits the low intrinsic dimension to improve the convergence rate of supervised learning algorithms. [sent-13, score-0.307]
3 In this paper, we describe a new way of modeling data that resides in RD but has lower intrinsic dimension d < D. [sent-15, score-0.299]
4 We call this spatial data structure a random projection tree (RP tree). [sent-18, score-0.535]
5 It can be thought of as a variant of the k-d tree that is provably manifoldadaptive. [sent-19, score-0.346]
6 k-d trees, RP trees, and vector quantization Recall that a k-d tree [3] partitions RD into hyperrectangular cells. [sent-20, score-0.419]
7 It is built in a recursive manner, splitting along one coordinate direction at a time. [sent-21, score-0.277]
8 The succession of splits corresponds to a binary tree whose leaves contain the individual cells in RD . [sent-22, score-0.563]
9 These trees are among the most widely-used methods for spatial partitioning in machine learning and computer vision. [sent-23, score-0.336]
10 1 Figure 1: Left: A spatial partitioning of R2 induced by a k-d tree with three levels. [sent-28, score-0.425]
11 On the left part of Figure 1 we illustrate a k-d tree for a set of vectors in R2 . [sent-31, score-0.367]
12 The leaves of the tree partition RD into cells; given a query point q, the cell containing q is identified by traversing down the k-d tree. [sent-32, score-0.546]
13 Each cell can be thought of as having a representative vector: its mean, depicted in the figure by a circle. [sent-33, score-0.222]
14 The partitioning together with these mean vectors define a vector quantization (VQ) of R2 : a mapping from R2 to a finite set of representative vectors (called a “codebook” in the context of lossy compression methods). [sent-34, score-0.238]
15 This error is closely related (in fact, proportional) to the average diameter of cells, that is, the average squared distance between pairs of points in a cell. [sent-38, score-0.286]
16 1 As the depth of the k-d tree increases the diameter of the cells decreases and so does the VQ error. [sent-39, score-0.676]
17 However, in high dimension, the rate of decrease of the average diameter can be very slow. [sent-40, score-0.264]
18 In fact, as we show in the supplementary material, there are data sets in RD for which a k-d tree requires D levels in order to halve the diameter. [sent-41, score-0.468]
19 This slow rate of decrease of cell diameter is fine if D = 2 as in Figure 1, but it is disastrous if D = 1000. [sent-42, score-0.454]
20 Constructing 1000 levels of the tree requires 21000 data points! [sent-43, score-0.426]
21 This problem is a real one that has been observed empirically: k-d trees are prone to a curse of dimensionality. [sent-44, score-0.261]
22 In general, k-d trees will not be able to benefit from this; in fact the bad example mentioned above has intrinsic dimension d = 1. [sent-46, score-0.5]
23 But we show that a simple variant of the k-d tree does indeed decrease cell diameters much more quickly. [sent-47, score-0.589]
24 Instead of splitting along coordinate directions, we use randomly chosen unit vectors, and instead of splitting data exactly at the median, we use a more carefully chosen split point. [sent-48, score-0.489]
25 We call the resulting data structure a random projection tree (Figure 1, right) and we show that it admits the following theoretical guarantee (formal statement is in the next section). [sent-49, score-0.477]
26 Pick any cell C in the RP tree, and suppose the data in C have intrinsic dimension d. [sent-50, score-0.489]
27 Pick a descendant cell ≥ d levels below; then with constant probability, this descendant has average diameter at most half that of C. [sent-51, score-0.565]
28 We thus have a vector quantization construction method for which the diameter of the cells depends on the intrinsic dimension, rather than the extrinsic dimension of the data. [sent-53, score-0.729]
29 A large part of the benefit of RP trees comes from the use of random unit directions, which is rather like running k-d trees with a preprocessing step in which the data are projected into a random 1 2 This is in contrast to the max diameter, the maximum distance between two vectors in a cell. [sent-54, score-0.7]
30 In fact, a recent experimental study of nearest neighbor algorithms [8] observes that a similar pre-processing step improves the performance of nearest neighbor schemes based on spatial data structures. [sent-57, score-0.418]
31 The explanation we provide is based on the assumption that the data has low intrinsic dimension. [sent-59, score-0.24]
32 Another spatial data structure based on random projections is the locality sensitive hashing scheme [6]. [sent-60, score-0.292]
33 Manifold learning and near neighbor search The fast rate of diameter decrease in random projection trees has many consequences beyond the quality of vector quantization. [sent-61, score-0.744]
34 In particular, the statistical theory of tree-based statistical estimators — whether used for classification or regression — is centered around the rate of diameter decrease; for details, see for instance Chapter 20 of [7]. [sent-62, score-0.251]
35 Thus RP trees generically exhibit faster convergence in all these contexts. [sent-63, score-0.231]
36 If the diameter of cells is small, then it is reasonable to classify a query point according to the majority label in its cell. [sent-65, score-0.351]
37 The classical work of Cover and Hart [5] on the Bayes risk of nearest neighbor methods applies equally to the majority vote in a small enough cell. [sent-67, score-0.197]
38 The left figure depicts data concentrated near a one-dimensional manifold. [sent-70, score-0.155]
39 Our goal is to partition data into small diameter regions so that the data in each region is well-approximated by its mean+PCA. [sent-72, score-0.307]
40 Some of the data lies close to a one-dimensional manifold, some of the data spans two dimensions, and some of the data (represented by the red dot) is concentrated around a single point (a zero-dimensional manifold). [sent-74, score-0.159]
41 In the literature, the most common way to capture this manifold structure is to create a graph in which nodes represent data points and edges connect pairs of nearby points. [sent-76, score-0.155]
42 Once these individual cells are small enough, the data in them can be well-approximated by an affine subspace, for instance that given by principal component analysis. [sent-80, score-0.271]
43 1 The RP tree algorithm Spatial data structures In what follows, we assume the data lie in RD , and we consider spatial data structures built by recursive binary splits. [sent-83, score-0.54]
44 procedure C HOOSE RULE(S) comment: PCA tree version let u be the principal eigenvector of the covariance of S Rule(x) := x · u ≤ median({z · u : z ∈ S}) return (Rule) This method will do a good job of adapting to low intrinsic dimension (details omitted). [sent-87, score-0.901]
45 First, estimating the principal eigenvector requires a significant amount of data; recall that only about 1/2k fraction of the data winds up at a cell at level k of the tree. [sent-89, score-0.439]
46 Second, when the extrinsic dimension is high, the amount of memory and computation required to compute the dot product between the data vectors and the eigenvectors becomes the dominant part of the computation. [sent-90, score-0.282]
47 As each node in the tree is likely to have a different eigenvector this severely limits the feasible tree depth. [sent-91, score-0.749]
48 We now show that using random projections overcomes these problems while maintaining the adaptivity to low intrinsic dimension. [sent-92, score-0.316]
49 2 Random projection trees We shall see that the key benefits of PCA-based splits can be realized much more simply, by picking random directions. [sent-94, score-0.493]
50 PCA will of course correctly identify this subspace, and a split along the principal eigenvector u will do a good job of reducing the diameter of the data. [sent-96, score-0.622]
51 But a random direction v will also have some component in the direction of u, and splitting along the median of v will not be all that different from splitting along u. [sent-97, score-0.479]
52 Figure 3: Intuition: a random direction is almost as good as the principal eigenvector. [sent-98, score-0.174]
53 Also, we can use the same random projection in different places in the tree; all we need is to choose a large enough set of projections that, with high probability, there is be a good projection direction for each node in the tree. [sent-100, score-0.395]
54 In our experience setting the number of projections equal to the depth of the tree is sufficient. [sent-101, score-0.434]
55 Thus, for a tree of depth k, we use only k projection vectors v, as opposed to 2k with a PCA tree. [sent-102, score-0.502]
56 When preparing data to train a tree we can compute the k projection values before building the tree. [sent-103, score-0.448]
57 For a cell containing points S, let ∆(S) be the diameter of S (the distance between the two furthest points in the set), and ∆A (S) the average diameter, that is, the 4 average distance between points of S: ∆2 (S) = A 1 |S|2 x−y 2 x,y∈S = 2 |S| x − mean(S) 2 . [sent-106, score-0.58]
58 x∈S 2 We use two different types of splits: if ∆ (S) is less than c∆2 (S) (for some constant c) then we A use the hyperplane split discussed above. [sent-107, score-0.183]
59 Otherwise, we split S into two groups based on distance from the mean. [sent-108, score-0.229]
60 procedure C HOOSE RULE(S) comment: RP tree version if ∆2 (S) ≤ c · ∆2 (S) A choose a random unit direction v sort projection values: a(x) = v · x ∀x ∈ S, generating the list a1 ≤ a2 ≤ · · · ≤ an for i = 1, . [sent-109, score-0.499]
61 4, for instance, splitting the bottom cell at the median would lead to a messy partition, whereas the RP tree split produces two clean, connected clusters. [sent-115, score-0.877]
62 3: The two PCA ellipses corresponding to the two cells after the first split. [sent-119, score-0.197]
63 5: The four PCA ellipses for the cells at the third level. [sent-121, score-0.197]
64 As the cells get smaller, their individual PCAs reveal 1D manifold structure. [sent-123, score-0.204]
65 Note: the ellipses are for comparison only; the RP tree algorithm does not look at them. [sent-124, score-0.409]
66 The second type of split, based on distance from the mean of the cell, is needed to deal with cases in which the cell contains data at very different scales. [sent-125, score-0.266]
67 If only splits by projection were allowed, then a large number of splits would be devoted to uselessly subdividing this point mass. [sent-127, score-0.368]
68 The second type of split separates it from the rest of the data in one go. [sent-128, score-0.213]
69 A large fraction of them might be “empty” background patches, in which case they’d fall near the center of the cell in a very tight cluster. [sent-130, score-0.251]
70 The effect of the split is then to separate out these two clusters. [sent-132, score-0.183]
71 3 Theoretical foundations In analyzing RP trees, we consider a statistical notion of dimension: we say set S has local covariance dimension (d, ) if (1 − ) fraction of the variance is concentrated in a d-dimensional subspace. [sent-134, score-0.237]
72 Definition 1 S ⊂ RD has local covariance dimension (d, ) if the largest d eigenvalues of its 2 2 2 2 2 2 covariance matrix satisfy σ1 + · · · + σd ≥ (1 − ) · (σ1 + · · · + σD ). [sent-136, score-0.206]
73 ) Now, suppose an RP tree is built from a data set X ⊂ RD , not necessarily finite. [sent-138, score-0.395]
74 Recall that there are two different types of splits; let’s call them splits by distance and splits by projection. [sent-139, score-0.316]
75 Suppose an RP tree is built using data set X ⊂ RD . [sent-141, score-0.395]
76 Consider any cell C for which X ∩ C has local covariance dimension (d, ), where < c1 . [sent-142, score-0.327]
77 Pick a point x ∈ S ∩ C at random, and let C be the cell that contains it at the next level down. [sent-143, score-0.19]
78 • If C is split by distance then E [∆(S ∩ C )] ≤ c2 ∆(S ∩ C). [sent-144, score-0.229]
79 • If C is split by projection, then E ∆2 (S ∩ C ) ≤ 1 − A c3 ∆2 (S ∩ C). [sent-145, score-0.183]
80 As a consequence, the expected average diameter of cells is halved every O(d) levels. [sent-147, score-0.319]
81 First of all, both splits operate on the projected data; for the second type of split (split by distance), data that fall in an interval around the median are separated from data outside that interval. [sent-151, score-0.523]
82 Second, the tree is built in a streaming manner: that is, the data arrive one at a time, and are processed (to update the tree) and immediately discarded. [sent-152, score-0.439]
83 This is managed by maintaining simple statistics at each internal node of the tree and updating them appropriately as the data streams by (more details in the supplementary matter). [sent-153, score-0.433]
84 Finally, instead of choosing a new random projection in each cell, a dictionary of a few random projections is chosen at the outset. [sent-155, score-0.233]
85 We will see that RP trees adapt well to such cases. [sent-160, score-0.231]
86 We compare the performance of different trees according to the average VQ error they incur at various levels. [sent-172, score-0.231]
87 We consider four types of trees: (1) k-d trees in which the coordinate for a split is chosen at random; (2) k-d trees in which at each split, the best coordinate is chosen (the one that most improves VQ error); (3) RP trees; and (4) for reference, PCA trees. [sent-173, score-0.747]
88 In both cases, RP trees outperform both k-d tree variants and are close to the performance of PCA trees without having to explicitly compute any principal components. [sent-175, score-0.875]
89 3 MNIST dataset We next demonstrate RP trees on the all-familiar MNIST dataset of handwritten digits. [sent-177, score-0.321]
90 This dataset consists of 28 × 28 grayscale images of the digits zero through nine, and is believed to have low intrinsic dimension (for instance, see [10]). [sent-178, score-0.381]
91 Figure 6 (top) shows the first few levels of the RP tree for the images of digit 1. [sent-180, score-0.464]
92 The bar underneath each node shows the fraction of points going to the left and to the right, to give a sense of how balanced each split is. [sent-183, score-0.284]
93 Alongside each mean, we also show a histogram of the 20 largest eigenvalues of the covariance matrix, which reveal how closely the data in the cell is concentrated near a low-dimensional subspace. [sent-184, score-0.388]
94 Hence, very quickly, the cell means become good representatives of the dataset: an experimental corroboration that RP trees adapt to the low intrinsic dimension of the data. [sent-188, score-0.728]
95 This is also brought out in Figure 6 (bottom), where the images are shown projected onto the plane defined by their top two principal components. [sent-189, score-0.179]
96 ) The left image shows how the data was split at the topmost level (dark versus light). [sent-191, score-0.257]
97 Observe that this random cut is actually quite close to what the PCA split would have been, corroborating our earlier intuition (recall Figure 3). [sent-192, score-0.212]
98 7 Figure 6: Top: Three levels of the RP tree for MNIST digit 1. [sent-194, score-0.435]
99 Colors represent different cells in the RP tree, after just one split (left) or after two levels of the tree (right). [sent-196, score-0.687]
100 A fast nearest neighbor algorithm based on a principal axis search tree. [sent-245, score-0.258]
wordName wordTfidf (topN-words)
[('rp', 0.526), ('tree', 0.32), ('trees', 0.231), ('diameter', 0.211), ('cell', 0.19), ('split', 0.183), ('intrinsic', 0.172), ('vq', 0.162), ('splits', 0.135), ('ree', 0.13), ('rd', 0.112), ('pca', 0.11), ('cells', 0.108), ('ake', 0.102), ('coord', 0.102), ('projection', 0.098), ('dimension', 0.097), ('manifold', 0.096), ('splitting', 0.096), ('principal', 0.093), ('neighbor', 0.092), ('hoose', 0.089), ('ellipses', 0.089), ('median', 0.088), ('projections', 0.077), ('levels', 0.076), ('extrinsic', 0.076), ('nearest', 0.073), ('rule', 0.073), ('concentrated', 0.069), ('eigenvector', 0.068), ('diego', 0.068), ('quantization', 0.065), ('spatial', 0.058), ('projected', 0.057), ('decrease', 0.053), ('aj', 0.052), ('direction', 0.052), ('dimensionality', 0.051), ('coordinate', 0.051), ('rightt', 0.051), ('partitioning', 0.047), ('vectors', 0.047), ('uc', 0.046), ('distance', 0.046), ('built', 0.045), ('dataset', 0.045), ('verma', 0.044), ('streaming', 0.044), ('topmost', 0.044), ('descendant', 0.044), ('san', 0.043), ('supplementary', 0.042), ('node', 0.041), ('comment', 0.04), ('hashing', 0.04), ('lef', 0.04), ('covariance', 0.04), ('instance', 0.04), ('mnist', 0.04), ('return', 0.039), ('digit', 0.039), ('low', 0.038), ('depth', 0.037), ('partition', 0.036), ('manner', 0.036), ('job', 0.034), ('avg', 0.034), ('thing', 0.034), ('tt', 0.034), ('partitions', 0.034), ('synthetic', 0.033), ('along', 0.033), ('dasgupta', 0.032), ('eigenvectors', 0.032), ('representative', 0.032), ('subspace', 0.032), ('majority', 0.032), ('locality', 0.031), ('pick', 0.031), ('fraction', 0.031), ('data', 0.03), ('curse', 0.03), ('colors', 0.03), ('belkin', 0.03), ('manifolds', 0.03), ('near', 0.03), ('eigenvalues', 0.029), ('images', 0.029), ('random', 0.029), ('points', 0.029), ('directions', 0.028), ('datasets', 0.028), ('sensitive', 0.027), ('recall', 0.027), ('lie', 0.027), ('variant', 0.026), ('depicts', 0.026), ('areas', 0.026), ('var', 0.025)]
simIndex simValue paperId paperTitle
same-paper 1 1.0000004 116 nips-2007-Learning the structure of manifolds using random projections
Author: Yoav Freund, Sanjoy Dasgupta, Mayank Kabra, Nakul Verma
Abstract: We present a simple variant of the k-d tree which automatically adapts to intrinsic low dimensional structure in data. 1
2 0.17306174 16 nips-2007-A learning framework for nearest neighbor search
Author: Lawrence Cayton, Sanjoy Dasgupta
Abstract: Can we leverage learning techniques to build a fast nearest-neighbor (ANN) retrieval data structure? We present a general learning framework for the NN problem in which sample queries are used to learn the parameters of a data structure that minimize the retrieval time and/or the miss rate. We explore the potential of this novel framework through two popular NN data structures: KD-trees and the rectilinear structures employed by locality sensitive hashing. We derive a generalization theory for these data structure classes and present simple learning algorithms for both. Experimental results reveal that learning often improves on the already strong performance of these data structures. 1
3 0.17092717 161 nips-2007-Random Projections for Manifold Learning
Author: Chinmay Hegde, Michael Wakin, Richard Baraniuk
Abstract: We propose a novel method for linear dimensionality reduction of manifold modeled data. First, we show that with a small number M of random projections of sample points in RN belonging to an unknown K-dimensional Euclidean manifold, the intrinsic dimension (ID) of the sample set can be estimated to high accuracy. Second, we rigorously prove that using only this set of random projections, we can estimate the structure of the underlying manifold. In both cases, the number of random projections required is linear in K and logarithmic in N , meaning that K < M ≪ N . To handle practical situations, we develop a greedy algorithm to estimate the smallest size of the projection space required to perform manifold learning. Our method is particularly relevant in distributed sensing systems and leads to significant potential savings in data acquisition, storage and transmission costs.
4 0.14403895 75 nips-2007-Efficient Bayesian Inference for Dynamically Changing Graphs
Author: Ozgur Sumer, Umut Acar, Alexander T. Ihler, Ramgopal R. Mettu
Abstract: Motivated by stochastic systems in which observed evidence and conditional dependencies between states of the network change over time, and certain quantities of interest (marginal distributions, likelihood estimates etc.) must be updated, we study the problem of adaptive inference in tree-structured Bayesian networks. We describe an algorithm for adaptive inference that handles a broad range of changes to the network and is able to maintain marginal distributions, MAP estimates, and data likelihoods in all expected logarithmic time. We give an implementation of our algorithm and provide experiments that show that the algorithm can yield up to two orders of magnitude speedups on answering queries and responding to dynamic changes over the sum-product algorithm. 1
5 0.11555367 107 nips-2007-Iterative Non-linear Dimensionality Reduction with Manifold Sculpting
Author: Michael Gashler, Dan Ventura, Tony Martinez
Abstract: Many algorithms have been recently developed for reducing dimensionality by projecting data onto an intrinsic non-linear manifold. Unfortunately, existing algorithms often lose significant precision in this transformation. Manifold Sculpting is a new algorithm that iteratively reduces dimensionality by simulating surface tension in local neighborhoods. We present several experiments that show Manifold Sculpting yields more accurate results than existing algorithms with both generated and natural data-sets. Manifold Sculpting is also able to benefit from both prior dimensionality reduction efforts. 1
6 0.11343576 78 nips-2007-Efficient Principled Learning of Thin Junction Trees
7 0.10604264 197 nips-2007-The Infinite Markov Model
8 0.099784374 27 nips-2007-Anytime Induction of Cost-sensitive Trees
9 0.098741151 20 nips-2007-Adaptive Embedded Subgraph Algorithms using Walk-Sum Analysis
10 0.086619459 188 nips-2007-Subspace-Based Face Recognition in Analog VLSI
11 0.084642 204 nips-2007-Theoretical Analysis of Heuristic Search Methods for Online POMDPs
12 0.083449259 126 nips-2007-McRank: Learning to Rank Using Multiple Classification and Gradient Boosting
13 0.083308786 10 nips-2007-A Randomized Algorithm for Large Scale Support Vector Learning
14 0.08089038 31 nips-2007-Bayesian Agglomerative Clustering with Coalescents
15 0.073264785 115 nips-2007-Learning the 2-D Topology of Images
16 0.069763243 160 nips-2007-Random Features for Large-Scale Kernel Machines
17 0.067083403 177 nips-2007-Simplified Rules and Theoretical Analysis for Information Bottleneck Optimization and PCA with Spiking Neurons
18 0.066432022 17 nips-2007-A neural network implementing optimal state estimation based on dynamic spike train decoding
19 0.064055651 186 nips-2007-Statistical Analysis of Semi-Supervised Regression
20 0.061895791 13 nips-2007-A Unified Near-Optimal Estimator For Dimension Reduction in $l \alpha$ ($0<\alpha\leq 2$) Using Stable Random Projections
topicId topicWeight
[(0, -0.205), (1, 0.042), (2, -0.041), (3, 0.005), (4, -0.038), (5, -0.034), (6, 0.018), (7, 0.127), (8, 0.03), (9, 0.073), (10, -0.138), (11, 0.328), (12, 0.108), (13, -0.046), (14, 0.068), (15, -0.101), (16, 0.002), (17, -0.045), (18, 0.018), (19, 0.001), (20, -0.002), (21, 0.067), (22, 0.028), (23, 0.072), (24, 0.229), (25, -0.128), (26, -0.101), (27, 0.059), (28, -0.042), (29, -0.032), (30, -0.135), (31, 0.003), (32, -0.083), (33, -0.117), (34, -0.14), (35, -0.028), (36, -0.1), (37, 0.018), (38, -0.085), (39, -0.081), (40, 0.069), (41, -0.044), (42, 0.079), (43, 0.021), (44, 0.041), (45, 0.031), (46, 0.007), (47, 0.065), (48, 0.071), (49, 0.001)]
simIndex simValue paperId paperTitle
same-paper 1 0.97777396 116 nips-2007-Learning the structure of manifolds using random projections
Author: Yoav Freund, Sanjoy Dasgupta, Mayank Kabra, Nakul Verma
Abstract: We present a simple variant of the k-d tree which automatically adapts to intrinsic low dimensional structure in data. 1
2 0.77130222 16 nips-2007-A learning framework for nearest neighbor search
Author: Lawrence Cayton, Sanjoy Dasgupta
Abstract: Can we leverage learning techniques to build a fast nearest-neighbor (ANN) retrieval data structure? We present a general learning framework for the NN problem in which sample queries are used to learn the parameters of a data structure that minimize the retrieval time and/or the miss rate. We explore the potential of this novel framework through two popular NN data structures: KD-trees and the rectilinear structures employed by locality sensitive hashing. We derive a generalization theory for these data structure classes and present simple learning algorithms for both. Experimental results reveal that learning often improves on the already strong performance of these data structures. 1
3 0.6936267 78 nips-2007-Efficient Principled Learning of Thin Junction Trees
Author: Anton Chechetka, Carlos Guestrin
Abstract: We present the first truly polynomial algorithm for PAC-learning the structure of bounded-treewidth junction trees – an attractive subclass of probabilistic graphical models that permits both the compact representation of probability distributions and efficient exact inference. For a constant treewidth, our algorithm has polynomial time and sample complexity. If a junction tree with sufficiently strong intraclique dependencies exists, we provide strong theoretical guarantees in terms of KL divergence of the result from the true distribution. We also present a lazy extension of our approach that leads to very significant speed ups in practice, and demonstrate the viability of our method empirically, on several real world datasets. One of our key new theoretical insights is a method for bounding the conditional mutual information of arbitrarily large sets of variables with only polynomially many mutual information computations on fixed-size subsets of variables, if the underlying distribution can be approximated by a bounded-treewidth junction tree. 1
4 0.67511874 27 nips-2007-Anytime Induction of Cost-sensitive Trees
Author: Saher Esmeir, Shaul Markovitch
Abstract: Machine learning techniques are increasingly being used to produce a wide-range of classifiers for complex real-world applications that involve nonuniform testing costs and misclassification costs. As the complexity of these applications grows, the management of resources during the learning and classification processes becomes a challenging task. In this work we introduce ACT (Anytime Cost-sensitive Trees), a novel framework for operating in such environments. ACT is an anytime algorithm that allows trading computation time for lower classification costs. It builds a tree top-down and exploits additional time resources to obtain better estimations for the utility of the different candidate splits. Using sampling techniques ACT approximates for each candidate split the cost of the subtree under it and favors the one with a minimal cost. Due to its stochastic nature ACT is expected to be able to escape local minima, into which greedy methods may be trapped. Experiments with a variety of datasets were conducted to compare the performance of ACT to that of the state of the art cost-sensitive tree learners. The results show that for most domains ACT produces trees of significantly lower costs. ACT is also shown to exhibit good anytime behavior with diminishing returns.
5 0.5510686 161 nips-2007-Random Projections for Manifold Learning
Author: Chinmay Hegde, Michael Wakin, Richard Baraniuk
Abstract: We propose a novel method for linear dimensionality reduction of manifold modeled data. First, we show that with a small number M of random projections of sample points in RN belonging to an unknown K-dimensional Euclidean manifold, the intrinsic dimension (ID) of the sample set can be estimated to high accuracy. Second, we rigorously prove that using only this set of random projections, we can estimate the structure of the underlying manifold. In both cases, the number of random projections required is linear in K and logarithmic in N , meaning that K < M ≪ N . To handle practical situations, we develop a greedy algorithm to estimate the smallest size of the projection space required to perform manifold learning. Our method is particularly relevant in distributed sensing systems and leads to significant potential savings in data acquisition, storage and transmission costs.
6 0.50167167 20 nips-2007-Adaptive Embedded Subgraph Algorithms using Walk-Sum Analysis
7 0.48553792 31 nips-2007-Bayesian Agglomerative Clustering with Coalescents
8 0.48518732 75 nips-2007-Efficient Bayesian Inference for Dynamically Changing Graphs
9 0.46674162 197 nips-2007-The Infinite Markov Model
10 0.45474631 107 nips-2007-Iterative Non-linear Dimensionality Reduction with Manifold Sculpting
11 0.41276777 89 nips-2007-Feature Selection Methods for Improving Protein Structure Prediction with Rosetta
12 0.3996979 72 nips-2007-Discriminative Log-Linear Grammars with Latent Variables
13 0.38816839 188 nips-2007-Subspace-Based Face Recognition in Analog VLSI
14 0.3260304 115 nips-2007-Learning the 2-D Topology of Images
15 0.31786516 204 nips-2007-Theoretical Analysis of Heuristic Search Methods for Online POMDPs
16 0.3153401 126 nips-2007-McRank: Learning to Rank Using Multiple Classification and Gradient Boosting
17 0.30605641 119 nips-2007-Learning with Tree-Averaged Densities and Distributions
18 0.30335239 10 nips-2007-A Randomized Algorithm for Large Scale Support Vector Learning
19 0.30291066 207 nips-2007-Transfer Learning using Kolmogorov Complexity: Basic Theory and Empirical Evaluations
topicId topicWeight
[(5, 0.054), (13, 0.035), (16, 0.037), (19, 0.033), (21, 0.058), (31, 0.025), (34, 0.036), (35, 0.067), (47, 0.105), (49, 0.013), (51, 0.184), (83, 0.176), (85, 0.031), (87, 0.018), (90, 0.05)]
simIndex simValue paperId paperTitle
same-paper 1 0.85032439 116 nips-2007-Learning the structure of manifolds using random projections
Author: Yoav Freund, Sanjoy Dasgupta, Mayank Kabra, Nakul Verma
Abstract: We present a simple variant of the k-d tree which automatically adapts to intrinsic low dimensional structure in data. 1
2 0.77686751 63 nips-2007-Convex Relaxations of Latent Variable Training
Author: Yuhong Guo, Dale Schuurmans
Abstract: We investigate a new, convex relaxation of an expectation-maximization (EM) variant that approximates a standard objective while eliminating local minima. First, a cautionary result is presented, showing that any convex relaxation of EM over hidden variables must give trivial results if any dependence on the missing values is retained. Although this appears to be a strong negative outcome, we then demonstrate how the problem can be bypassed by using equivalence relations instead of value assignments over hidden variables. In particular, we develop new algorithms for estimating exponential conditional models that only require equivalence relation information over the variable values. This reformulation leads to an exact expression for EM variants in a wide range of problems. We then develop a semidefinite relaxation that yields global training by eliminating local minima. 1
3 0.77178526 180 nips-2007-Sparse Feature Learning for Deep Belief Networks
Author: Marc'aurelio Ranzato, Y-lan Boureau, Yann L. Cun
Abstract: Unsupervised learning algorithms aim to discover the structure hidden in the data, and to learn representations that are more suitable as input to a supervised machine than the raw input. Many unsupervised methods are based on reconstructing the input from the representation, while constraining the representation to have certain desirable properties (e.g. low dimension, sparsity, etc). Others are based on approximating density by stochastically reconstructing the input from the representation. We describe a novel and efficient algorithm to learn sparse representations, and compare it theoretically and experimentally with a similar machine trained probabilistically, namely a Restricted Boltzmann Machine. We propose a simple criterion to compare and select different unsupervised machines based on the trade-off between the reconstruction error and the information content of the representation. We demonstrate this method by extracting features from a dataset of handwritten numerals, and from a dataset of natural image patches. We show that by stacking multiple levels of such machines and by training sequentially, high-order dependencies between the input observed variables can be captured. 1
4 0.77071589 138 nips-2007-Near-Maximum Entropy Models for Binary Neural Representations of Natural Images
Author: Matthias Bethge, Philipp Berens
Abstract: Maximum entropy analysis of binary variables provides an elegant way for studying the role of pairwise correlations in neural populations. Unfortunately, these approaches suffer from their poor scalability to high dimensions. In sensory coding, however, high-dimensional data is ubiquitous. Here, we introduce a new approach using a near-maximum entropy model, that makes this type of analysis feasible for very high-dimensional data—the model parameters can be derived in closed form and sampling is easy. Therefore, our NearMaxEnt approach can serve as a tool for testing predictions from a pairwise maximum entropy model not only for low-dimensional marginals, but also for high dimensional measurements of more than thousand units. We demonstrate its usefulness by studying natural images with dichotomized pixel intensities. Our results indicate that the statistics of such higher-dimensional measurements exhibit additional structure that are not predicted by pairwise correlations, despite the fact that pairwise correlations explain the lower-dimensional marginal statistics surprisingly well up to the limit of dimensionality where estimation of the full joint distribution is feasible. 1
5 0.77009523 115 nips-2007-Learning the 2-D Topology of Images
Author: Nicolas L. Roux, Yoshua Bengio, Pascal Lamblin, Marc Joliveau, Balázs Kégl
Abstract: We study the following question: is the two-dimensional structure of images a very strong prior or is it something that can be learned with a few examples of natural images? If someone gave us a learning task involving images for which the two-dimensional topology of pixels was not known, could we discover it automatically and exploit it? For example suppose that the pixels had been permuted in a fixed but unknown way, could we recover the relative two-dimensional location of pixels on images? The surprising result presented here is that not only the answer is yes, but that about as few as a thousand images are enough to approximately recover the relative locations of about a thousand pixels. This is achieved using a manifold learning algorithm applied to pixels associated with a measure of distributional similarity between pixel intensities. We compare different topologyextraction approaches and show how having the two-dimensional topology can be exploited.
6 0.76827282 94 nips-2007-Gaussian Process Models for Link Analysis and Transfer Learning
7 0.76766664 65 nips-2007-DIFFRAC: a discriminative and flexible framework for clustering
8 0.76718426 134 nips-2007-Multi-Task Learning via Conic Programming
9 0.76617324 16 nips-2007-A learning framework for nearest neighbor search
10 0.76547873 177 nips-2007-Simplified Rules and Theoretical Analysis for Information Bottleneck Optimization and PCA with Spiking Neurons
11 0.76272762 186 nips-2007-Statistical Analysis of Semi-Supervised Regression
12 0.76243079 18 nips-2007-A probabilistic model for generating realistic lip movements from speech
13 0.76093531 24 nips-2007-An Analysis of Inference with the Universum
14 0.7598291 158 nips-2007-Probabilistic Matrix Factorization
15 0.75871122 45 nips-2007-Classification via Minimum Incremental Coding Length (MICL)
16 0.75705451 209 nips-2007-Ultrafast Monte Carlo for Statistical Summations
17 0.75587398 187 nips-2007-Structured Learning with Approximate Inference
18 0.75557047 49 nips-2007-Colored Maximum Variance Unfolding
19 0.75556463 76 nips-2007-Efficient Convex Relaxation for Transductive Support Vector Machine
20 0.75521523 148 nips-2007-Online Linear Regression and Its Application to Model-Based Reinforcement Learning