nips nips2005 nips2005-172 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Jorge Silva, Jorge Marques, João Lemos
Abstract: There has been a surge of interest in learning non-linear manifold models to approximate high-dimensional data. Both for computational complexity reasons and for generalization capability, sparsity is a desired feature in such models. This usually means dimensionality reduction, which naturally implies estimating the intrinsic dimension, but it can also mean selecting a subset of the data to use as landmarks, which is especially important because many existing algorithms have quadratic complexity in the number of observations. This paper presents an algorithm for selecting landmarks, based on LASSO regression, which is well known to favor sparse approximations because it uses regularization with an l1 norm. As an added benefit, a continuous manifold parameterization, based on the landmarks, is also found. Experimental results with synthetic and real data illustrate the algorithm. 1
Reference: text
sentIndex sentText sentNum sentScore
1 pt Abstract There has been a surge of interest in learning non-linear manifold models to approximate high-dimensional data. [sent-17, score-0.192]
2 Both for computational complexity reasons and for generalization capability, sparsity is a desired feature in such models. [sent-18, score-0.06]
3 This usually means dimensionality reduction, which naturally implies estimating the intrinsic dimension, but it can also mean selecting a subset of the data to use as landmarks, which is especially important because many existing algorithms have quadratic complexity in the number of observations. [sent-19, score-0.27]
4 This paper presents an algorithm for selecting landmarks, based on LASSO regression, which is well known to favor sparse approximations because it uses regularization with an l1 norm. [sent-20, score-0.143]
5 As an added benefit, a continuous manifold parameterization, based on the landmarks, is also found. [sent-21, score-0.226]
6 Experimental results with synthetic and real data illustrate the algorithm. [sent-22, score-0.051]
7 1 Introduction The recent interest in manifold learning algorithms is due, in part, to the multiplication of very large datasets of high-dimensional data from numerous disciplines of science, from signal processing to bioinformatics [6]. [sent-23, score-0.222]
8 As an example, consider a video sequence such as the one in Figure 1. [sent-24, score-0.045]
9 In the absence of features like contour points or wavelet coefficients, each image of size 71 × 71 pixels is a point in a space of dimension equal to the number of pixels, 71 × 71 = 5041. [sent-25, score-0.118]
10 More generally, each observation is a vector y ∈ Rm where m may be very large. [sent-27, score-0.042]
11 A reasonable assumption, when facing an observation space of possibly tens of thousands of dimensions, is that the data are not dense in such a space, because several of the mea- Figure 1: Example of a high-dimensional dataset: each image of size 71 × 71 pixels is a point in R5041 . [sent-28, score-0.073]
12 In fact, in many problems of interest, there are only a few free parameters, which are embedded in the observed variables, frequently in a nonlinear way. [sent-30, score-0.057]
13 Assuming that the number of free parameters remains the same throughout the observations, and also assuming smooth variation of the parameters, one is in fact dealing with geometric restrictions which can be well modelled as a manifold. [sent-31, score-0.097]
14 Therefore, the data must lie on, or near (accounting for noise) a manifold embedded in observation, or ambient space. [sent-32, score-0.249]
15 Learning this manifold is a natural approach to the problem of modelling the data, since, besides computational issues, sparse models tend to have better generalization capability. [sent-33, score-0.247]
16 In order to achieve sparsity, considerable effort has been devoted to reducing the dimensionality of the data by some form of non-linear projection. [sent-34, score-0.084]
17 In contrast, the problem of finding a relevant subset of the observations has received less attention. [sent-36, score-0.029]
18 It should be noted that the complexity of most existing algorithms is, in general, dependent not only on the dimensionality but also on the number of observations. [sent-37, score-0.055]
19 An important example is the ISOMAP [10], where the computational cost is quadratic in the number of points, which has motivated the L-ISOMAP variant [3] which uses a randomly chosen subset of the points as landmarks (L is for Landmark). [sent-38, score-0.78]
20 The proposed algorithm uses, instead, a principled approach to select the landmarks, based on the solutions of a regression problem minimizing a regularized cost functional. [sent-39, score-0.157]
21 When the regularization term is based on the l1 norm, the solution tends to be sparse. [sent-40, score-0.038]
22 This means that the correct amount of regularization can be automatically found. [sent-43, score-0.038]
23 In the specific context of selecting landmarks for manifold learning, with some care in the LASSO problem formulation, one is able to avoid a difficult problem of sparse regression with Multiple Measurement Vectors (MMV), which has received considerable interest in its own right [2]. [sent-44, score-1.053]
24 The idea is to use local information, found by local PCA as usual, and preserve the smooth variation of the tangent subspace over a larger scale, taking advantage of any known embedding. [sent-45, score-0.3]
25 This is a natural extension of the Tangent Bundle Approximation (TBA) algorithm, proposed in [9], since the principal angles, which TBA computes anyway, are readily avail1 The S in LARS stands for Stagewise and LASSO, an allusion to the relationship between the three algorithms. [sent-46, score-0.165]
26 Nevertheless, the method proposed here is independent of TBA and could, for instance, be plugged into a global procedure like L-ISOMAP. [sent-48, score-0.097]
27 The algorithm avoids costly global computations, that is, it doesn’t attempt to preserve geodesic distances between faraway points, and yet, unlike most local algorithms, it is explicitly designed to be sparse while retaining generalization ability. [sent-49, score-0.366]
28 The selection procedure itself is covered in section 2, while also providing a quick overview of the LASSO and LARS methods. [sent-51, score-0.03]
29 1 Problem formulation The problem can be formulated as following: given N vectors y ∈ Rm , suppose that the y can be approximated by a differentiable n-manifold M embedded in Rm . [sent-54, score-0.134]
30 This means that M can be charted through one or more invertible and differentiable mappings of the type gi (y) = x (1) to vectors x ∈ Rn so that open sets Pi ⊂ M, called patches, whose union covers M, are diffeomorphically mapped onto other open sets Ui ⊂ Rn , called parametric domains. [sent-55, score-0.276]
31 Rn is the lower dimensional parameter space and n is the intrinsic dimension of M. [sent-56, score-0.13]
32 The gi are called charts, and manifolds with complex topology may require several gi . [sent-57, score-0.185]
33 Equivalently, since the charts are invertible, inverse mappings hi : Rn → Rm , called parameterizations can be also be found. [sent-58, score-0.152]
34 Arranging the original data in a matrix Y ∈ Rm×N , with the y as column vectors and assuming, for now, only one mapping g, the charting process produces a matrix X ∈ Rn×N : y11 . [sent-59, score-0.213]
35 xnN (2) The n rows of X are sometimes called features or latent variables. [sent-83, score-0.029]
36 It is often intended in manifold learning to estimate the correct intrinsic dimension, n, as well as the chart g or at least a column-to-column mapping from Y to X. [sent-84, score-0.451]
37 In the present case, this mapping will be assumed known, and so will n. [sent-85, score-0.044]
38 What is intended is to select a subset of the columns of X (or of Y, since the mapping between them is known) to use as landmarks, while retaining enough information about g, resulting in a reduced n × N ′ matrix with N ′ < N . [sent-86, score-0.251]
39 Preserving g is equivalent to preserving its inverse mapping, the parameterization h, which is more practical because it allows the following generative model: y = h(x) + η (3) in which η is zero mean Gaussian observation noise. [sent-88, score-0.202]
40 How to find the fewest possible landmarks so that h can still be well approximated? [sent-89, score-0.655]
41 1 Landmark selection Linear regression model To solve the problem, it is proposed to start by converting the non-linear regression in (3) to a linear regression by offloading the non-linearity onto a kernel, as described in numerous works, such as [7]. [sent-91, score-0.355]
42 Since there are N columns in X to start with, let K be a square, N × N , symmetric semidefinite positive matrix such that K kij K(x, xj ) = {kij } = K(xi , xj ) = exp(− x − xj 2 2σK 2 ). [sent-92, score-0.248]
43 (4) The function K can be readily recognized as a Gaussian kernel. [sent-93, score-0.031]
44 This allows the reformulation, in matrix form, of (3) as YT = KB + E (5) , where B, E ∈ RN ×m and each line of E is a realization of η above. [sent-94, score-0.03]
45 Still, it is difficult to proceed directly from (5), because neither the response, YT , nor the regression parameters, B, are column vectors. [sent-95, score-0.16]
46 This leads to a Multiple Measurement Vectors (MMV) problem, and while there is nothing to prevent solving it separately for each column, this makes it harder to impose sparsity in all columns simultaneously. [sent-96, score-0.138]
47 Two alternative approaches present themselves at this point: • Solve a sparse regression problem for each column of YT (and the corresponding column of B), find a way to force several lines of B to zero. [sent-97, score-0.276]
48 • Re-formulate (5) is a way that turns it to a single measurement value problem. [sent-98, score-0.05]
49 Since the parameterization h is known and must be, at the very least, bijective and continuous, then it must preserve the smoothness of quantities like the geodesic distance and the principal angles. [sent-100, score-0.363]
50 Therefore, it is proposed to re-formulate (5) as θ = Kβ + ǫ (6) where the new response, θ ∈ RN , as well as β ∈ RN and ǫ ∈ RN are now column vectors, allowing the use of known subset selection procedures. [sent-101, score-0.118]
51 The elements of θ can be, for example, the geodesic distances to the yµ = h(xµ ) observation corresponding to the mean, xµ of the columns of X. [sent-102, score-0.292]
52 This would be a possibility if an algorithm like ISOMAP were used to find the chart from Y to X. [sent-103, score-0.063]
53 However, since the whole point of using landmarks is to know them beforehand, so as to avoid having to compute N × N geodesic distances, this is not the most interesting alternative. [sent-104, score-0.728]
54 A better way is to use a computationally lighter quantity like the maximum principal angle between the tangent subspace at yµ , Tyµ (M), and the tangent subspaces at all other y. [sent-105, score-0.609]
55 Given a point y0 and its k nearest neighbors, finding the tangent subspace can be done by local PCA. [sent-106, score-0.201]
56 The sample covariance matrix S can be decomposed as S = 1 k k (yi − y0 )(yi − y0 )T (7) i=0 S = VDVT (8) where the columns of V are the eigenvectors vi and D is a diagonal matrix containing the eigenvalues λi , in descending order. [sent-107, score-0.199]
57 The eigenvectors form an orthonormal basis aligned with the principal directions of the data. [sent-108, score-0.167]
58 They can be divided in two groups: tangent and normal vectors, spanning the tangent and normal subspaces, with dimensions n and m − n, respectively. [sent-109, score-0.354]
59 The tangent subspaces are spanned from the n most important eigenvectors. [sent-111, score-0.232]
60 The principal angles between two different tangent subspaces at different points y0 can be determined from the column spaces of the corresponding matrices V. [sent-112, score-0.513]
61 An in-depth description of the principal angles, as well as efficient algorithms to compute them, can be found, for instance, in [1]. [sent-113, score-0.106]
62 Note that, should the Ty (M) be already available from the eigenvectors found during some local PCA analysis, e. [sent-114, score-0.061]
63 , during estimation of the intrinsic dimension, there would be little extra computational burden. [sent-116, score-0.084]
64 An example is [9], where the principal angles already are an integral part of the procedure - namely for partitioning the manifold into patches. [sent-117, score-0.401]
65 Thus, it is proposed to use θj equal to the maximum principal angle between Tyµ (M) and Tyj (M), where yj is the j-th column of Y. [sent-118, score-0.265]
66 It remains to be explained how to achieve a sparse solution to (6). [sent-119, score-0.055]
67 (9) m ˆ q ˆ ˆ Here, β q denotes the lq norm of β, i. [sent-122, score-0.027]
68 q i=1 |βi | , and γ is a tuning parameter that controls the amount of regularization. [sent-124, score-0.037]
69 This is the usual formulation of a LASSO regression problem. [sent-128, score-0.099]
70 While minimization of (9) can be done using quadratic programming, the recent development of the LARS method has made this unnecessary. [sent-129, score-0.052]
71 LARS then proceeds in a ˆ direction equiangular to all the active βj and the process is repeated until all covariates have ˆ been added. [sent-132, score-0.055]
72 There are a total of m steps, each of which adds a new βj , making it non-zero. [sent-133, score-0.037]
73 With slight modifications, these steps correspond to a sampling of the tuning parameter γ in (9) under LASSO. [sent-134, score-0.037]
74 Moreover, [4] shows that the risk, as a function of the number, p, of ˆ non-zero βj , can be estimated (under mild assumptions) as ˆ ˆ R(β p ) = θ − Kβ p 2 /¯ 2 − m + 2p σ (10) where σ 2 can be found from the unconstrained least squares solution of (6). [sent-135, score-0.104]
75 3 Landmarks and parameterization of the manifold The landmarks are the columns xj of X (or of Y) with the same indexes j as the non-zero elements of β p , where p = arg min R(β p ). [sent-138, score-1.098]
76 (11) p There are N ′ = p landmarks, because there are p non-zero elements in β p . [sent-139, score-0.033]
77 This criterion ensures that the landmarks are the kernel centers that minimize the risk of the regression in (6). [sent-140, score-0.787]
78 The new, smaller regression (12) can be solved T separately for each column of Y and B by unconstrained least squares. [sent-144, score-0.229]
79 For a new feature vector, x, in the parametric domain, a new vector y ∈ M in observation space can be synthesized by y = hB,X′ (x) = yj (x) = [y1 (x) . [sent-145, score-0.042]
80 ym (x)] T (13) bij K(xi , x) xi ∈X′ where the {bij } are the elements of B. [sent-148, score-0.079]
81 3 Results The algorithm has been tested in two synthetic datasets: the traditional synthetic “swiss roll” and a sphere, both with 1000 points embedded in R10 , with a small amount of isotropic Gaussian noise (σy = 0. [sent-149, score-0.2]
82 A global embedding for the swiss roll was found by ISOMAP, using k = 8. [sent-152, score-0.191]
83 On the other hand, TBA was used for the sphere, resulting in multiple patches and charts - a necessity, because otherwise the sphere’s topology would make ISOMAP fail. [sent-153, score-0.156]
84 Therefore, in the sphere, each patch has its own landmark points, and the manifold require the union of all such points. [sent-154, score-0.347]
85 Additionally, a real dataset was used: images from the video sequence shown above in Figure 1. [sent-156, score-0.045]
86 This example is known [9] to be reasonably well modelled by as few as 2 free parameters. [sent-157, score-0.031]
87 A first step was to perform global PCA in order to discard irrelevant dimensions. [sent-159, score-0.039]
88 6 −1 1200 −50 1000 800 −100 600 400 −150 200 0 −200 −200 −400 −600 −250 −800 0 200 400 (a) 600 800 1000 −10 0 10 20 30 40 50 60 70 80 (b) Figure 2: Above: landmarks; Middle: interpolated points using hB,X′ ; Below: risk estimates. [sent-194, score-0.101]
89 For the sphere, the risk plot is for the largest patch. [sent-195, score-0.06]
90 Total landmarks, N ′ = 27 for the swiss roll, 42 for the sphere. [sent-196, score-0.057]
91 to compute a covariance matrix of size 5000 × 5000 from 194 samples, the problem was transposed, leading to the computation of the eigenvectors of a N × N covariance, from which the first N − 1 eigenvectors of the non-transposed problem can easily be found [11]. [sent-197, score-0.152]
92 This resulted in an estimated 15 globally significant principal directions, on which the data were projected. [sent-198, score-0.106]
93 An embedding was found using TBA with 2 features (ISOMAP would have worked as well). [sent-200, score-0.029]
94 Only 4 landmarks were needed, and they correspond to very distinct face expressions. [sent-202, score-0.628]
95 4 Discussion A new approach for selecting landmarks in manifold learning, based on LASSO and LARS regression, has been presented. [sent-203, score-0.87]
96 Also, a continuous manifold parameterization is given with very little 800 600 400 200 0 −1000 −200 −400 1000 0 1000 500 0 −500 −1000 2000 Figure 3: Landmarks for the video sequence: N ′ = 4, marked over a scatter plot of the first 3 eigen-coordinates. [sent-205, score-0.366]
97 The entire procedure avoids expensive, quadratic programming computations - its complexity is dominated by the LARS step, which has the same cost as a least squares fit [4]. [sent-208, score-0.247]
98 The proposed approach has been validated with experiments on synthetic and real datasets. [sent-209, score-0.079]
99 Sparse representation for multiple measurement vectors (mmv) in an over-complete dictionary. [sent-220, score-0.098]
wordName wordTfidf (topN-words)
[('landmarks', 0.628), ('lars', 0.249), ('lasso', 0.229), ('manifold', 0.192), ('tangent', 0.163), ('tba', 0.157), ('landmark', 0.126), ('mmv', 0.126), ('isomap', 0.125), ('rn', 0.115), ('principal', 0.106), ('geodesic', 0.1), ('regression', 0.099), ('sphere', 0.096), ('parameterization', 0.095), ('charts', 0.094), ('lisbon', 0.094), ('intrinsic', 0.084), ('portugal', 0.082), ('ty', 0.082), ('columns', 0.078), ('angles', 0.073), ('silva', 0.071), ('angle', 0.07), ('subspaces', 0.069), ('roll', 0.066), ('preserving', 0.065), ('chart', 0.063), ('marques', 0.063), ('preserve', 0.062), ('column', 0.061), ('eigenvectors', 0.061), ('sparsity', 0.06), ('risk', 0.06), ('swiss', 0.057), ('embedded', 0.057), ('dimensionality', 0.055), ('sparse', 0.055), ('covariates', 0.055), ('rm', 0.054), ('quadratic', 0.052), ('synthetic', 0.051), ('selecting', 0.05), ('bundle', 0.05), ('measurement', 0.05), ('yt', 0.049), ('vectors', 0.048), ('bij', 0.046), ('dimension', 0.046), ('video', 0.045), ('gi', 0.045), ('mapping', 0.044), ('observation', 0.042), ('points', 0.041), ('icassp', 0.04), ('distances', 0.039), ('global', 0.039), ('invertible', 0.038), ('indexes', 0.038), ('kij', 0.038), ('retaining', 0.038), ('regularization', 0.038), ('subspace', 0.038), ('tuning', 0.037), ('variation', 0.037), ('adds', 0.037), ('pca', 0.037), ('least', 0.036), ('manifolds', 0.036), ('squares', 0.035), ('hastie', 0.035), ('xj', 0.034), ('continuous', 0.034), ('avoids', 0.033), ('unconstrained', 0.033), ('elements', 0.033), ('intended', 0.032), ('patches', 0.032), ('readily', 0.031), ('modelled', 0.031), ('pixels', 0.031), ('programming', 0.031), ('matrix', 0.03), ('cost', 0.03), ('numerous', 0.03), ('topology', 0.03), ('procedure', 0.03), ('differentiable', 0.029), ('embedding', 0.029), ('mappings', 0.029), ('dealing', 0.029), ('called', 0.029), ('subset', 0.029), ('considerable', 0.029), ('union', 0.029), ('proposed', 0.028), ('dimensions', 0.028), ('lq', 0.027), ('fewest', 0.027), ('beforehand', 0.027)]
simIndex simValue paperId paperTitle
same-paper 1 0.99999988 172 nips-2005-Selecting Landmark Points for Sparse Manifold Learning
Author: Jorge Silva, Jorge Marques, João Lemos
Abstract: There has been a surge of interest in learning non-linear manifold models to approximate high-dimensional data. Both for computational complexity reasons and for generalization capability, sparsity is a desired feature in such models. This usually means dimensionality reduction, which naturally implies estimating the intrinsic dimension, but it can also mean selecting a subset of the data to use as landmarks, which is especially important because many existing algorithms have quadratic complexity in the number of observations. This paper presents an algorithm for selecting landmarks, based on LASSO regression, which is well known to favor sparse approximations because it uses regularization with an l1 norm. As an added benefit, a continuous manifold parameterization, based on the landmarks, is also found. Experimental results with synthetic and real data illustrate the algorithm. 1
2 0.16752616 77 nips-2005-From Lasso regression to Feature vector machine
Author: Fan Li, Yiming Yang, Eric P. Xing
Abstract: Lasso regression tends to assign zero weights to most irrelevant or redundant features, and hence is a promising technique for feature selection. Its limitation, however, is that it only offers solutions to linear models. Kernel machines with feature scaling techniques have been studied for feature selection with non-linear models. However, such approaches require to solve hard non-convex optimization problems. This paper proposes a new approach named the Feature Vector Machine (FVM). It reformulates the standard Lasso regression into a form isomorphic to SVM, and this form can be easily extended for feature selection with non-linear models by introducing kernels defined on feature vectors. FVM generates sparse solutions in the nonlinear feature space and it is much more tractable compared to feature scaling kernel machines. Our experiments with FVM on simulated data show encouraging results in identifying the small number of dominating features that are non-linearly correlated to the response, a task the standard Lasso fails to complete.
3 0.15702115 138 nips-2005-Non-Local Manifold Parzen Windows
Author: Yoshua Bengio, Hugo Larochelle, Pascal Vincent
Abstract: To escape from the curse of dimensionality, we claim that one can learn non-local functions, in the sense that the value and shape of the learned function at x must be inferred using examples that may be far from x. With this objective, we present a non-local non-parametric density estimator. It builds upon previously proposed Gaussian mixture models with regularized covariance matrices to take into account the local shape of the manifold. It also builds upon recent work on non-local estimators of the tangent plane of a manifold, which are able to generalize in places with little training data, unlike traditional, local, non-parametric models.
4 0.08738067 155 nips-2005-Predicting EMG Data from M1 Neurons with Variational Bayesian Least Squares
Author: Jo-anne Ting, Aaron D'souza, Kenji Yamamoto, Toshinori Yoshioka, Donna Hoffman, Shinji Kakei, Lauren Sergio, John Kalaska, Mitsuo Kawato
Abstract: An increasing number of projects in neuroscience requires the statistical analysis of high dimensional data sets, as, for instance, in predicting behavior from neural firing or in operating artificial devices from brain recordings in brain-machine interfaces. Linear analysis techniques remain prevalent in such cases, but classical linear regression approaches are often numerically too fragile in high dimensions. In this paper, we address the question of whether EMG data collected from arm movements of monkeys can be faithfully reconstructed with linear approaches from neural activity in primary motor cortex (M1). To achieve robust data analysis, we develop a full Bayesian approach to linear regression that automatically detects and excludes irrelevant features in the data, regularizing against overfitting. In comparison with ordinary least squares, stepwise regression, partial least squares, LASSO regression and a brute force combinatorial search for the most predictive input features in the data, we demonstrate that the new Bayesian method offers a superior mixture of characteristics in terms of regularization against overfitting, computational efficiency and ease of use, demonstrating its potential as a drop-in replacement for other linear regression techniques. As neuroscientific results, our analyses demonstrate that EMG data can be well predicted from M1 neurons, further opening the path for possible real-time interfaces between brains and machines. 1
5 0.085001305 66 nips-2005-Estimation of Intrinsic Dimensionality Using High-Rate Vector Quantization
Author: Maxim Raginsky, Svetlana Lazebnik
Abstract: We introduce a technique for dimensionality estimation based on the notion of quantization dimension, which connects the asymptotic optimal quantization error for a probability distribution on a manifold to its intrinsic dimension. The definition of quantization dimension yields a family of estimation algorithms, whose limiting case is equivalent to a recent method based on packing numbers. Using the formalism of high-rate vector quantization, we address issues of statistical consistency and analyze the behavior of our scheme in the presence of noise.
6 0.08435934 168 nips-2005-Rodeo: Sparse Nonparametric Regression in High Dimensions
7 0.083333708 190 nips-2005-The Curse of Highly Variable Functions for Local Kernel Machines
8 0.081689574 9 nips-2005-A Domain Decomposition Method for Fast Manifold Learning
9 0.077996857 42 nips-2005-Combining Graph Laplacians for Semi--Supervised Learning
10 0.068994261 180 nips-2005-Spectral Bounds for Sparse PCA: Exact and Greedy Algorithms
11 0.066973604 45 nips-2005-Conditional Visual Tracking in Kernel Space
12 0.064425446 189 nips-2005-Tensor Subspace Analysis
13 0.059242308 50 nips-2005-Convex Neural Networks
14 0.058969978 27 nips-2005-Analysis of Spectral Kernel Design based Semi-supervised Learning
15 0.058548849 56 nips-2005-Diffusion Maps, Spectral Clustering and Eigenfunctions of Fokker-Planck Operators
16 0.057012103 116 nips-2005-Learning Topology with the Generative Gaussian Graph and the EM Algorithm
17 0.053576417 126 nips-2005-Metric Learning by Collapsing Classes
18 0.0535453 10 nips-2005-A General and Efficient Multiple Kernel Learning Algorithm
19 0.051712602 15 nips-2005-A Theoretical Analysis of Robust Coding over Noisy Overcomplete Channels
20 0.050945874 147 nips-2005-On the Convergence of Eigenspaces in Kernel Principal Component Analysis
topicId topicWeight
[(0, 0.188), (1, 0.067), (2, -0.052), (3, -0.01), (4, 0.005), (5, 0.023), (6, 0.042), (7, -0.113), (8, 0.135), (9, -0.021), (10, 0.055), (11, 0.054), (12, -0.006), (13, 0.042), (14, -0.025), (15, 0.188), (16, -0.155), (17, 0.168), (18, -0.175), (19, -0.009), (20, 0.054), (21, -0.005), (22, -0.03), (23, -0.104), (24, 0.042), (25, -0.065), (26, -0.142), (27, 0.006), (28, -0.005), (29, 0.029), (30, -0.001), (31, 0.011), (32, -0.019), (33, -0.063), (34, -0.042), (35, -0.108), (36, 0.02), (37, 0.088), (38, 0.083), (39, -0.021), (40, -0.039), (41, 0.038), (42, 0.026), (43, 0.243), (44, 0.08), (45, -0.061), (46, -0.01), (47, 0.024), (48, -0.011), (49, -0.134)]
simIndex simValue paperId paperTitle
same-paper 1 0.93803275 172 nips-2005-Selecting Landmark Points for Sparse Manifold Learning
Author: Jorge Silva, Jorge Marques, João Lemos
Abstract: There has been a surge of interest in learning non-linear manifold models to approximate high-dimensional data. Both for computational complexity reasons and for generalization capability, sparsity is a desired feature in such models. This usually means dimensionality reduction, which naturally implies estimating the intrinsic dimension, but it can also mean selecting a subset of the data to use as landmarks, which is especially important because many existing algorithms have quadratic complexity in the number of observations. This paper presents an algorithm for selecting landmarks, based on LASSO regression, which is well known to favor sparse approximations because it uses regularization with an l1 norm. As an added benefit, a continuous manifold parameterization, based on the landmarks, is also found. Experimental results with synthetic and real data illustrate the algorithm. 1
2 0.7019372 77 nips-2005-From Lasso regression to Feature vector machine
Author: Fan Li, Yiming Yang, Eric P. Xing
Abstract: Lasso regression tends to assign zero weights to most irrelevant or redundant features, and hence is a promising technique for feature selection. Its limitation, however, is that it only offers solutions to linear models. Kernel machines with feature scaling techniques have been studied for feature selection with non-linear models. However, such approaches require to solve hard non-convex optimization problems. This paper proposes a new approach named the Feature Vector Machine (FVM). It reformulates the standard Lasso regression into a form isomorphic to SVM, and this form can be easily extended for feature selection with non-linear models by introducing kernels defined on feature vectors. FVM generates sparse solutions in the nonlinear feature space and it is much more tractable compared to feature scaling kernel machines. Our experiments with FVM on simulated data show encouraging results in identifying the small number of dominating features that are non-linearly correlated to the response, a task the standard Lasso fails to complete.
3 0.5683099 138 nips-2005-Non-Local Manifold Parzen Windows
Author: Yoshua Bengio, Hugo Larochelle, Pascal Vincent
Abstract: To escape from the curse of dimensionality, we claim that one can learn non-local functions, in the sense that the value and shape of the learned function at x must be inferred using examples that may be far from x. With this objective, we present a non-local non-parametric density estimator. It builds upon previously proposed Gaussian mixture models with regularized covariance matrices to take into account the local shape of the manifold. It also builds upon recent work on non-local estimators of the tangent plane of a manifold, which are able to generalize in places with little training data, unlike traditional, local, non-parametric models.
4 0.51032668 168 nips-2005-Rodeo: Sparse Nonparametric Regression in High Dimensions
Author: Larry Wasserman, John D. Lafferty
Abstract: We present a method for nonparametric regression that performs bandwidth selection and variable selection simultaneously. The approach is based on the technique of incrementally decreasing the bandwidth in directions where the gradient of the estimator with respect to bandwidth is large. When the unknown function satisfies a sparsity condition, our approach avoids the curse of dimensionality, achieving the optimal minimax rate of convergence, up to logarithmic factors, as if the relevant variables were known in advance. The method—called rodeo (regularization of derivative expectation operator)—conducts a sequence of hypothesis tests, and is easy to implement. A modified version that replaces hard with soft thresholding effectively solves a sequence of lasso problems. 1
5 0.4840773 9 nips-2005-A Domain Decomposition Method for Fast Manifold Learning
Author: Zhenyue Zhang, Hongyuan Zha
Abstract: We propose a fast manifold learning algorithm based on the methodology of domain decomposition. Starting with the set of sample points partitioned into two subdomains, we develop the solution of the interface problem that can glue the embeddings on the two subdomains into an embedding on the whole domain. We provide a detailed analysis to assess the errors produced by the gluing process using matrix perturbation theory. Numerical examples are given to illustrate the efficiency and effectiveness of the proposed methods. 1
6 0.47377387 132 nips-2005-Nearest Neighbor Based Feature Selection for Regression and its Application to Neural Activity
7 0.46885926 155 nips-2005-Predicting EMG Data from M1 Neurons with Variational Bayesian Least Squares
8 0.43843749 190 nips-2005-The Curse of Highly Variable Functions for Local Kernel Machines
9 0.42759508 180 nips-2005-Spectral Bounds for Sparse PCA: Exact and Greedy Algorithms
10 0.39327765 137 nips-2005-Non-Gaussian Component Analysis: a Semi-parametric Framework for Linear Dimension Reduction
11 0.39013055 44 nips-2005-Computing the Solution Path for the Regularized Support Vector Regression
12 0.38450098 66 nips-2005-Estimation of Intrinsic Dimensionality Using High-Rate Vector Quantization
13 0.34168777 16 nips-2005-A matching pursuit approach to sparse Gaussian process regression
14 0.33210629 45 nips-2005-Conditional Visual Tracking in Kernel Space
15 0.32658744 116 nips-2005-Learning Topology with the Generative Gaussian Graph and the EM Algorithm
16 0.32118049 10 nips-2005-A General and Efficient Multiple Kernel Learning Algorithm
17 0.30817476 189 nips-2005-Tensor Subspace Analysis
18 0.29042661 98 nips-2005-Infinite latent feature models and the Indian buffet process
19 0.28504893 126 nips-2005-Metric Learning by Collapsing Classes
20 0.28481835 35 nips-2005-Bayesian model learning in human visual perception
topicId topicWeight
[(3, 0.053), (10, 0.046), (27, 0.03), (31, 0.037), (34, 0.122), (41, 0.014), (55, 0.03), (65, 0.356), (69, 0.069), (73, 0.039), (88, 0.092), (91, 0.03)]
simIndex simValue paperId paperTitle
1 0.89005041 141 nips-2005-Norepinephrine and Neural Interrupts
Author: Peter Dayan, Angela J. Yu
Abstract: Experimental data indicate that norepinephrine is critically involved in aspects of vigilance and attention. Previously, we considered the function of this neuromodulatory system on a time scale of minutes and longer, and suggested that it signals global uncertainty arising from gross changes in environmental contingencies. However, norepinephrine is also known to be activated phasically by familiar stimuli in welllearned tasks. Here, we extend our uncertainty-based treatment of norepinephrine to this phasic mode, proposing that it is involved in the detection and reaction to state uncertainty within a task. This role of norepinephrine can be understood through the metaphor of neural interrupts. 1
2 0.83726335 199 nips-2005-Value Function Approximation with Diffusion Wavelets and Laplacian Eigenfunctions
Author: Sridhar Mahadevan, Mauro Maggioni
Abstract: We investigate the problem of automatically constructing efficient representations or basis functions for approximating value functions based on analyzing the structure and topology of the state space. In particular, two novel approaches to value function approximation are explored based on automatically constructing basis functions on state spaces that can be represented as graphs or manifolds: one approach uses the eigenfunctions of the Laplacian, in effect performing a global Fourier analysis on the graph; the second approach is based on diffusion wavelets, which generalize classical wavelets to graphs using multiscale dilations induced by powers of a diffusion operator or random walk on the graph. Together, these approaches form the foundation of a new generation of methods for solving large Markov decision processes, in which the underlying representation and policies are simultaneously learned.
same-paper 3 0.81916887 172 nips-2005-Selecting Landmark Points for Sparse Manifold Learning
Author: Jorge Silva, Jorge Marques, João Lemos
Abstract: There has been a surge of interest in learning non-linear manifold models to approximate high-dimensional data. Both for computational complexity reasons and for generalization capability, sparsity is a desired feature in such models. This usually means dimensionality reduction, which naturally implies estimating the intrinsic dimension, but it can also mean selecting a subset of the data to use as landmarks, which is especially important because many existing algorithms have quadratic complexity in the number of observations. This paper presents an algorithm for selecting landmarks, based on LASSO regression, which is well known to favor sparse approximations because it uses regularization with an l1 norm. As an added benefit, a continuous manifold parameterization, based on the landmarks, is also found. Experimental results with synthetic and real data illustrate the algorithm. 1
4 0.79883367 152 nips-2005-Phase Synchrony Rate for the Recognition of Motor Imagery in Brain-Computer Interface
Author: Le Song, Evian Gordon, Elly Gysels
Abstract: Motor imagery attenuates EEG µ and β rhythms over sensorimotor cortices. These amplitude changes are most successfully captured by the method of Common Spatial Patterns (CSP) and widely used in braincomputer interfaces (BCI). BCI methods based on amplitude information, however, have not incoporated the rich phase dynamics in the EEG rhythm. This study reports on a BCI method based on phase synchrony rate (SR). SR, computed from binarized phase locking value, describes the number of discrete synchronization events within a window. Statistical nonparametric tests show that SRs contain significant differences between 2 types of motor imageries. Classifiers trained on SRs consistently demonstrate satisfactory results for all 5 subjects. It is further observed that, for 3 subjects, phase is more discriminative than amplitude in the first 1.5-2.0 s, which suggests that phase has the potential to boost the information transfer rate in BCIs. 1
5 0.51100153 66 nips-2005-Estimation of Intrinsic Dimensionality Using High-Rate Vector Quantization
Author: Maxim Raginsky, Svetlana Lazebnik
Abstract: We introduce a technique for dimensionality estimation based on the notion of quantization dimension, which connects the asymptotic optimal quantization error for a probability distribution on a manifold to its intrinsic dimension. The definition of quantization dimension yields a family of estimation algorithms, whose limiting case is equivalent to a recent method based on packing numbers. Using the formalism of high-rate vector quantization, we address issues of statistical consistency and analyze the behavior of our scheme in the presence of noise.
6 0.51080626 190 nips-2005-The Curse of Highly Variable Functions for Local Kernel Machines
7 0.50730127 9 nips-2005-A Domain Decomposition Method for Fast Manifold Learning
8 0.48745281 105 nips-2005-Large-Scale Multiclass Transduction
9 0.47865564 26 nips-2005-An exploration-exploitation model based on norepinepherine and dopamine activity
10 0.4740895 168 nips-2005-Rodeo: Sparse Nonparametric Regression in High Dimensions
11 0.47367591 116 nips-2005-Learning Topology with the Generative Gaussian Graph and the EM Algorithm
12 0.47155869 56 nips-2005-Diffusion Maps, Spectral Clustering and Eigenfunctions of Fokker-Planck Operators
13 0.46423796 114 nips-2005-Learning Rankings via Convex Hull Separation
14 0.46206698 184 nips-2005-Structured Prediction via the Extragradient Method
15 0.46099448 92 nips-2005-Hyperparameter and Kernel Learning for Graph Based Semi-Supervised Classification
16 0.46012866 94 nips-2005-Identifying Distributed Object Representations in Human Extrastriate Visual Cortex
17 0.46006849 132 nips-2005-Nearest Neighbor Based Feature Selection for Regression and its Application to Neural Activity
18 0.45968536 154 nips-2005-Preconditioner Approximations for Probabilistic Graphical Models
19 0.45962682 177 nips-2005-Size Regularized Cut for Data Clustering
20 0.45876777 50 nips-2005-Convex Neural Networks