nips nips2008 nips2008-61 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Christian Walder, Bernhard Schölkopf
Abstract: This paper introduces a new approach to constructing meaningful lower dimensional representations of sets of data points. We argue that constraining the mapping between the high and low dimensional spaces to be a diffeomorphism is a natural way of ensuring that pairwise distances are approximately preserved. Accordingly we develop an algorithm which diffeomorphically maps the data near to a lower dimensional subspace and then projects onto that subspace. The problem of solving for the mapping is transformed into one of solving for an Eulerian flow field which we compute using ideas from kernel methods. We demonstrate the efficacy of our approach on various real world data sets. 1
Reference: text
sentIndex sentText sentNum sentScore
1 de Abstract This paper introduces a new approach to constructing meaningful lower dimensional representations of sets of data points. [sent-4, score-0.24]
2 We argue that constraining the mapping between the high and low dimensional spaces to be a diffeomorphism is a natural way of ensuring that pairwise distances are approximately preserved. [sent-5, score-0.548]
3 Accordingly we develop an algorithm which diffeomorphically maps the data near to a lower dimensional subspace and then projects onto that subspace. [sent-6, score-0.254]
4 The problem of solving for the mapping is transformed into one of solving for an Eulerian flow field which we compute using ideas from kernel methods. [sent-7, score-0.193]
5 1 Introduction The problem of visualizing high dimensional data often arises in the context of exploratory data analysis. [sent-9, score-0.226]
6 For many real world data sets this is a challenging task, as the spaces in which the data lie are often too high dimensional to be visualized directly. [sent-10, score-0.226]
7 If the data themselves lie on a lower dimensional subspace however, dimensionality reduction techniques may be employed, which aim to meaningfully represent the data as elements of this lower dimensional subspace. [sent-11, score-0.745]
8 The earliest approaches to dimensionality reduction are the linear methods known as principal components analysis (PCA) and factor analysis (Duda et al. [sent-12, score-0.293]
9 In the present case, it is pertinent to make the distinction between methods which focus on properties of the mapping to the lower dimensional space, and methods which focus on properties of the mapped data, in that space. [sent-19, score-0.529]
10 , ym of (Cox & Cox, 1994) m 2 ( xi − xj − yi − yj ) , (1) i,j=1 where here, as throughout the paper, the xi ∈ Ra are input or high dimensional points, and the yi ∈ Rb are output or low dimensional points, so that b < a. [sent-23, score-0.496]
11 Note that the above term is a function only of the input points and the corresponding mapped points, and is designed to preserve the pairwise distances of the data set. [sent-24, score-0.269]
12 The methods which focus on the mapping itself (from the higher to the lower dimensional space, which we refer to as the downward mapping, or the upward mapping which is the converse) are less common, and form a category into which the present work falls. [sent-25, score-0.759]
13 The GP-LVM places a Gaussian process (GP) prior over each high dimensional component of the upward mapping, and optimizes with respect to the set of low dimensional points—which can be thought of as hyper-parameters of the model—the likelihood of the high dimensional points. [sent-27, score-0.84]
14 Hence the GP-LVM constructs a regular (in the sense of regularization, i. [sent-28, score-0.073]
15 By doing so, the model guarantees that nearby points in the low dimensional space should be mapped to nearby points in the high dimensional space—an intuitive idea for dimensionality reduction which is also present in the MDS objective (1), above. [sent-31, score-1.101]
16 The converse is not guaranteed in the original GP-LVM however, and this has lead to the more recent development of the so-called back-constrained GP-LVM (Lawrence & Candela, 2006), which essentially places an additional GP prior over the downward mapping. [sent-32, score-0.224]
17 By guaranteeing in this way that (the modes of the posterior distributions over) both the upward and downward mappings are regular, the back constrained GP-LVM induces something reminiscent of a diffeomorphic mapping between the two spaces. [sent-33, score-0.933]
18 This leads us to the present work, in which we derive our new algorithm, Diffeomap, by explicitly casting the dimensionality reduction problem as one of constructing a diffeomorphic mapping between the low dimensional space and the subspace of the high dimensional space on which the data lie. [sent-34, score-1.239]
19 The mapping F : U → V is said to be a diffeomorphism if it is bijective (i. [sent-38, score-0.294]
20 We note in passing the connection between this definition, our discussion of the GP-LVM, and dimensionality reduction. [sent-43, score-0.202]
21 The GP-LVM constructs a regular upward mapping (analogous to F −1 ) which ensures that points nearby in Rb will be mapped to points nearby in Ra , a property referred to as similarity preservation in (Lawrence & Candela, 2006). [sent-44, score-0.826]
22 The back constrained GP-LVM simultaneously ensures that the downward mapping (analogous to F ) is regular, thereby additionally implementing what its authors refer to as dissimilarity preservation. [sent-45, score-0.469]
23 1) and regularity (imposed on the downward and upward mappings by the GP prior in the back constrained GP-LVM) complete the analogy. [sent-47, score-0.509]
24 There is also an alternative, more direct motivation for diffeomorphic mappings in the context of dimensionality reduction, however. [sent-48, score-0.515]
25 In particular, a diffeomorphic mapping has the property that it does not lose any information. [sent-49, score-0.424]
26 That is, given the mapping itself and the lower dimensional representation of the data set, it is always possible to reconstruct the original data. [sent-50, score-0.361]
27 There has been significant interest from within the image processing community, in the construction of diffeomorphic mappings for the purpose of image warping (Dupuis & Grenander, 1998; Joshi & Miller, 2000; Karacali & Davatzikos, 2003). [sent-51, score-0.547]
28 ¸ Let I : U → R3 represent the RGB values of an image, where U ⊂ R2 is the image plane. [sent-53, score-0.067]
29 that it does not “tear” the image, by ensuring the W be a diffeomorphism U → U . [sent-56, score-0.131]
30 The following two main approaches to constructing such diffeomorphisms have been taken by the image processing community, the first of which we mention for reference, while the second forms the basis of Diffeomap. [sent-57, score-0.187]
31 It is a notable aside that there seem to be no image warping algorithms analogous to the back constrained GP-LVM, in which regular forward and inverse mappings are simultaneously constructed. [sent-58, score-0.438]
32 This approach has been successfully applied to the problem of warping 3D magnetic resonance images (Karacali & Davatzikos, 2003), for example, ¸ but a key ingredient of that success was the fact that the authors defined the mapping W numerically on a regular grid. [sent-61, score-0.361]
33 For the high dimensional cases relevant to dimensionality reduction however, such a numerical grid is highly computationally unattractive. [sent-62, score-0.519]
34 2 R R φ(x, 1) = ψ(x) (s, φ(x, s)) (1, v(φ(x, s), s)) x t 0 s 1 Figure 1: The relationship between v(·, ·), φ(·, ·) and ψ(·) for the one dimensional case ψ : R → R. [sent-66, score-0.198]
35 1 Diffeomorphisms via Flow Fields The idea here is to indirectly define the mapping of interest, call it ψ : Ra → Ra , by way of a “time” indexed velocity field v : Ra × R → Ra . [sent-68, score-0.163]
36 (3) The role of v is to transport a given point x from its original location at time 0 to its mapped location φ(x, 1) by way of a trajectory whose position and tangent vector at time s are given by φ(x, s) and v(φ(x, s), s), respectively (see Figure 1). [sent-71, score-0.214]
37 The point of this construction is that if v satisfies certain regularity properties, then the mapping ψ will be a diffeomorphism. [sent-72, score-0.202]
38 This fact has been proven in a number of places—one particularly accessible example is (Dupuis & Grenander, 1998), where the necessary conditions are provided for the three dimensional case along with a proof that the induced mapping is a diffeomorphism. [sent-73, score-0.361]
39 Generalizing the result to higher dimensions is straightforward—this fact is stated in (Dupuis & Grenander, 1998) along with the basic idea of how to do so. [sent-74, score-0.075]
40 It is clear that for the mapping ψ to be a diffeomorphism, then for any such pair of points x and x′ , the associated trajectories must not collide. [sent-77, score-0.265]
41 This is because the two trajectories would be identical after the collision, x and x′ would map to the same point, and hence the mapping would not be invertible. [sent-78, score-0.222]
42 But if v is sufficiently regular then such collisions cannot occur. [sent-79, score-0.073]
43 3 Diffeomorphic Dimensionality Reduction The framework of Eulerian flow fields which we have just introduced provides an elegant means of constructing diffeomorphic mappings Ra → Ra , but for dimensionality reduction we require additional ingredients, which we now introduce. [sent-80, score-0.685]
44 The basic idea is to construct a diffeomorphic mapping in such a way that it maps our data set near to a subspace of Ra , and then to project onto this subspace. [sent-81, score-0.48]
45 We can now write the mapping ϕ : Ra → Rb which we propose for dimensionality reduction as ϕ(x) = P(a→b) φ(x, 1), 3 (5) where φ is given by (2). [sent-86, score-0.456]
46 We choose each component of v at each time to belong to a reproducing kernel Hilbert Space (RKHS) H, so that v(·, t) ∈ Ha , t ∈ [0, 1]. [sent-87, score-0.074]
47 In particular v need not be regular in its second argument. [sent-89, score-0.073]
48 For dimensionality reduction we propose to construct v as the minimizer of m 1 v(·, t) O=λ 2 Hd dt + t=0 L (ψ(xj )) , (7) j=1 where λ ∈ R+ is a regularization parameter. [sent-90, score-0.332]
49 Here, L measures the squared distance to our b dimensional linear subspace of interest Sb , i. [sent-91, score-0.254]
50 (8) d=b+1 Note that this places special importance on the first b dimensions of the input space of interest— accordingly we make the natural and important preprocessing step of applying PCA such that as much as possible of the variance of the data is captured in these first b dimensions. [sent-94, score-0.164]
51 a, (9) j=1 where k is the reproducing kernel of H and αd is a function [0, 1] → Rm . [sent-99, score-0.074]
52 This was proven directly for a similar specific case (Joshi & Miller, 2000), but we note in passing that it follows immediately from the celebrated representer theorem of RKHS’s (Sch¨ lkopf et al. [sent-100, score-0.103]
53 Hence, we have simplified the problem of determining v to one of determining m trajectories φ(xj , ·). [sent-102, score-0.059]
54 This is because not only does (9) hold, but we can use standard manipulations (in the context of kernel ridge regression, for example) to determine that for a given set of such trajectories, αd (t) = K(t)−1 ud (t), m×m d = 1, 2, . [sent-103, score-0.176]
55 , a, (10) m where t ∈ [0, 1], K(t) ∈ R , ud (t) ∈ R and we have let [K(t)]j,k = k(φ(xj , t), φ(xk , t)) along with [ud (t)]j = ∂t φ(xj , t). [sent-106, score-0.146]
56 Note that the invertibility of K(t) is guaranteed for certain kernel functions (including the Gaussian kernel which we employ in all our Experiments, see Section 4), provided that the set φ(xj , t) are distinct. [sent-107, score-0.087]
57 the fact that f, k(x, ·) H = f (x), ∀f ∈ H), that for the optimal v, a v(·, t) 2 Ha ud (t)⊤ K(t)−1 ud (t). [sent-110, score-0.292]
58 = (11) d=1 This allows us to write our objective (7) in terms of the m trajectories mentioned above: 1 a m a 2 ud (t)⊤ K(t)−1 ud (t) + O=λ t=0 d=1 [φ(xj , 1)]d . [sent-111, score-0.351]
59 (12) j=1 d=b+1 So far no approximations have been made, and we have constructed an optimal finite dimensional basis for v(·, t). [sent-112, score-0.198]
60 , p, where δ = 1/p, and make the approximation ∂t=tk φ(xj , t) = (φ(xj , tk ) − φ(xj , tk−1 )) /δ. [sent-117, score-0.163]
61 9 1 (a) (b) (c) (d) Figure 2: Dimensionality reduction of motion capture data. [sent-137, score-0.162]
62 (a) The data mapped from 102 to 2 dimensions using Diffeomap (the line shows the temporal order in which the input data were recorded). [sent-138, score-0.243]
63 t k approximation t=tk−1 K(t)−1 dt = δK(tk−1 )−1 , and substituting into (12) we obtain the first form of our problem which is finite dimensional and hence readily optimized, i. [sent-140, score-0.198]
64 4 Experiments It is difficult to objectively compare dimensionality reduction algorithms, as there is no universally agreed upon measure of performance. [sent-159, score-0.293]
65 5 a æ " e i 1 o @ u (a) (b) (c) Figure 3: Vowel data mapped from 24 to 2 dimensions using (a) PCA and (b)-(c) Diffeomap. [sent-175, score-0.243]
66 Plots (b) and (c) differ only in the parameter settings of Diffeomap, with (b) corresponding to minimal one nearest neighbor errors in the low dimensional space—see Section 4. [sent-176, score-0.34]
67 Diffeomap also succeeded in this sense, and produced results which are competitive with those of the back constrained GP-LVM. [sent-184, score-0.146]
68 2 Vowel Data In this next example we consider a data set of a = 24 features (cepstral coefficients and delta cepstral coefficients) of a single speaker performing nine different vowels 300 times per vowel, acquired as training data for a vocal joystick system (Bilmes & et. [sent-186, score-0.081]
69 The results in Figure 3 are directly comparable to those provided in (Lawrence & Candela, 2006) for the GP-LVM, back constrained GP-LVM, and Isomap (Tenenbaum et al. [sent-195, score-0.146]
70 Visually, the Diffeomap result appears to be superior to those of the GP-LVM and Isomap, and comparable to the back constrained GP-LVM. [sent-197, score-0.146]
71 We also measured the performance of a one nearest neighbor classifier applied to the mapped data in R2 . [sent-198, score-0.31]
72 For the best choice of the parameters σ and λ, Diffeomap made 140 errors, which is favorable to the figures quoted for Isomap (458), the GP-LVM (226) and the back constrained GP-LVM (155) in (Lawrence & Candela, 2006). [sent-199, score-0.146]
73 3 USPS Handwritten Digits We now consider the USPS database of handwritten digits (Hull, 1994). [sent-202, score-0.104]
74 Following the methodology of the stochastic neighbor embedding (SNE) and GP-LVM papers (Hinton & Roweis, 2003; Lawrence, 2004), we take 600 images per class from the five classes corresponding to digits 0, 1, 2, 3, 4. [sent-203, score-0.204]
75 Since the images are in gray scale and a resolution of 16 by 16 pixels, this results in a data set of m = 3000 examples in a = 256 dimensions, which we again mapped to b = 2 dimensions as depicted in Figure 4. [sent-204, score-0.243]
76 The figure shows the individual points color coded according to class, along 6 (a) (b) Figure 4: USPS handwritten digits 0-4 mapped to 2 dimensions using Diffeomap. [sent-205, score-0.5]
77 (b) A composite image of the mapped data—see Section 4. [sent-207, score-0.265]
78 with a composite image formed by sequentially drawing each digit in random order at its mapped location, but only if it would not obscure a previously drawn digit. [sent-209, score-0.292]
79 Diffeomap manages to arrange the data in a manner which reveals such image properties as digit angle and stroke thickness. [sent-210, score-0.067]
80 Although unfortunate, we believe that this splitting can be explained by the fact that (a) the left- and right-pointing ones are rather dissimilar in input space, and (b) the number of fairly vertical ones which could help to connect the left- and right-pointing ones is rather small. [sent-212, score-0.099]
81 We believe this is due to the fact that the nearest neighbor graph used by SNE is highly appropriate to the USPS data set. [sent-214, score-0.142]
82 This is indicated by the fact that a nearest neighbor classifier in the 256 dimensional input space is known to perform strongly, with numerous authors having reported error rates of less than 5% on the ten class classification problem. [sent-215, score-0.43]
83 4 NIPS Text Data Finally, we present results on the text data of papers from the NIPS conference proceedings volumes 0-12, which can be obtained from http://www. [sent-217, score-0.099]
84 This experiment is intended to address the natural concern that by working in the input space rather than on a nearest neighbor graph, for example, Diffeomap may have difficulty with very high dimensional data. [sent-222, score-0.368]
85 The result is a data set of m = 1740 papers represented in a = 13649 words + 2037 authors = 15686 dimensions. [sent-228, score-0.13]
86 Note however that the input dimensionality is effectively reduced by the PCA preprocessing step to m − 1 = 1739, that being the rank of the centered covariance matrix of the data. [sent-229, score-0.201]
87 In particular, we provide a first figure which shows the data mapped to b = 2 dimensions, with certain authors (or groups of authors) color coded—the choice of authors and their corresponding color codes follows precisely those of (Song et al. [sent-231, score-0.386]
88 A second figure shows a plain marker drawn at the mapped locations corresponding to each of the papers. [sent-233, score-0.168]
89 This second figure also contains the paper title and authors of the corrsponding papers however, which are revealed when the user moves the mouse over the marked locations. [sent-234, score-0.13]
90 Since the mapping may be hard to judge, we note in passing that the correct classification rate of a one nearest neighbor classifier applied to the result of Diffeomap was 48%, which compares favorably to the rate of 33% achieved by linear PCA (which we use for preprocessing). [sent-236, score-0.342]
91 To compute this score we treated authors as classes, and considered only those authors who were color coded both in our supplementary figure and in (Song et al. [sent-237, score-0.234]
92 5 Conclusion We have presented an approach to dimensionality reduction which is based on the idea that the mapping between the lower and higher dimensional spaces should be diffeomorphic. [sent-239, score-0.654]
93 We provided a justification for this approach, by showing that the common intuition that dimensionality reduction algorithms should approximately preserve pairwise distances of a given data set is closely related to the idea that the mapping induced by the algorithm should be a diffeomorphism. [sent-240, score-0.514]
94 This realization allowed us to take advantage of established mathematical machinery in order to convert the dimensionality reduction problem into a so called Eulerian flow problem, the solution of which is guaranteed to generate a diffeomorphism. [sent-241, score-0.32]
95 Requiring that the mapping and its inverse both be smooth is reminiscent of the GP-LVM algorithm (Lawrence & Candela, 2006), but has the advantage in terms of statistical strength that we need not separately estimate a mapping in each direction. [sent-242, score-0.365]
96 We showed results of our algorithm, Diffeomap, on a relatively small motion capture data set, a larger vowel data set, the USPS image data set, and finally the rather high dimensional data set derived from the text corpus of NIPS papers, with successes in all cases. [sent-243, score-0.449]
97 Since our new approach performs well in practice while being significantly different to all previous approaches to dimensionality reduction, it has the potential to lead to a significant new direction in the field. [sent-244, score-0.165]
98 Variational problems on flows of diffeomorphisms for image matching. [sent-280, score-0.145]
99 Gaussian process latent variable models for visualisation of high dimensional data. [sent-318, score-0.226]
100 Local distance preservation in the GP-LVM through back constraints. [sent-330, score-0.15]
wordName wordTfidf (topN-words)
[('diffeomap', 0.366), ('ra', 0.293), ('diffeomorphic', 0.261), ('dimensional', 0.198), ('mapped', 0.168), ('dimensionality', 0.165), ('mapping', 0.163), ('tk', 0.163), ('dupuis', 0.157), ('grenander', 0.157), ('lawrence', 0.15), ('ud', 0.146), ('candela', 0.146), ('upward', 0.137), ('diffeomorphism', 0.131), ('eulerian', 0.131), ('reduction', 0.128), ('joshi', 0.114), ('downward', 0.098), ('rb', 0.098), ('vowel', 0.091), ('mappings', 0.089), ('usps', 0.088), ('neighbor', 0.084), ('cox', 0.084), ('back', 0.081), ('davatzikos', 0.078), ('diffeomorphisms', 0.078), ('karacali', 0.078), ('sne', 0.078), ('dimensions', 0.075), ('miller', 0.074), ('pca', 0.074), ('regular', 0.073), ('xj', 0.072), ('ha', 0.071), ('preservation', 0.069), ('papers', 0.068), ('image', 0.067), ('constrained', 0.065), ('nearby', 0.065), ('coded', 0.063), ('isomap', 0.063), ('warping', 0.063), ('roweis', 0.062), ('authors', 0.062), ('trajectories', 0.059), ('nearest', 0.058), ('subspace', 0.056), ('places', 0.053), ('bilmes', 0.052), ('demers', 0.052), ('venna', 0.052), ('hinton', 0.052), ('digits', 0.052), ('handwritten', 0.052), ('song', 0.051), ('gp', 0.05), ('color', 0.047), ('ow', 0.046), ('transport', 0.046), ('converse', 0.046), ('cottrell', 0.046), ('maaten', 0.046), ('reproducing', 0.044), ('points', 0.043), ('cepstral', 0.042), ('deformation', 0.042), ('duda', 0.042), ('sb', 0.042), ('constructing', 0.042), ('nips', 0.04), ('vocal', 0.039), ('reminiscent', 0.039), ('minimizer', 0.039), ('regularity', 0.039), ('gure', 0.038), ('passing', 0.037), ('mds', 0.037), ('preprocessing', 0.036), ('hull', 0.035), ('lkopf', 0.035), ('motion', 0.034), ('ones', 0.033), ('representer', 0.031), ('sch', 0.031), ('van', 0.031), ('text', 0.031), ('preserve', 0.03), ('freely', 0.03), ('composite', 0.03), ('kernel', 0.03), ('der', 0.029), ('numerous', 0.028), ('high', 0.028), ('distances', 0.028), ('guaranteed', 0.027), ('formed', 0.027), ('tenenbaum', 0.027), ('rkhs', 0.027)]
simIndex simValue paperId paperTitle
same-paper 1 1.0000001 61 nips-2008-Diffeomorphic Dimensionality Reduction
Author: Christian Walder, Bernhard Schölkopf
Abstract: This paper introduces a new approach to constructing meaningful lower dimensional representations of sets of data points. We argue that constraining the mapping between the high and low dimensional spaces to be a diffeomorphism is a natural way of ensuring that pairwise distances are approximately preserved. Accordingly we develop an algorithm which diffeomorphically maps the data near to a lower dimensional subspace and then projects onto that subspace. The problem of solving for the mapping is transformed into one of solving for an Eulerian flow field which we compute using ideas from kernel methods. We demonstrate the efficacy of our approach on various real world data sets. 1
2 0.12375181 248 nips-2008-Using matrices to model symbolic relationship
Author: Ilya Sutskever, Geoffrey E. Hinton
Abstract: We describe a way of learning matrix representations of objects and relationships. The goal of learning is to allow multiplication of matrices to represent symbolic relationships between objects and symbolic relationships between relationships, which is the main novelty of the method. We demonstrate that this leads to excellent generalization in two different domains: modular arithmetic and family relationships. We show that the same system can learn first-order propositions such as (2, 5) ∈ +3 or (Christopher, Penelope) ∈ has wife, and higher-order propositions such as (3, +3) ∈ plus and (+3, −3) ∈ inverse or (has husband, has wife) ∈ higher oppsex. We further demonstrate that the system understands how higher-order propositions are related to first-order ones by showing that it can correctly answer questions about first-order propositions involving the relations +3 or has wife even though it has not been trained on any first-order examples involving these relations. 1
3 0.11757673 227 nips-2008-Supervised Exponential Family Principal Component Analysis via Convex Optimization
Author: Yuhong Guo
Abstract: Recently, supervised dimensionality reduction has been gaining attention, owing to the realization that data labels are often available and indicate important underlying structure in the data. In this paper, we present a novel convex supervised dimensionality reduction approach based on exponential family PCA, which is able to avoid the local optima of typical EM learning. Moreover, by introducing a sample-based approximation to exponential family models, it overcomes the limitation of the prevailing Gaussian assumptions of standard PCA, and produces a kernelized formulation for nonlinear supervised dimensionality reduction. A training algorithm is then devised based on a subgradient bundle method, whose scalability can be gained using a coordinate descent procedure. The advantage of our global optimization approach is demonstrated by empirical results over both synthetic and real data. 1
4 0.091399923 63 nips-2008-Dimensionality Reduction for Data in Multiple Feature Representations
Author: Yen-yu Lin, Tyng-luh Liu, Chiou-shann Fuh
Abstract: In solving complex visual learning tasks, adopting multiple descriptors to more precisely characterize the data has been a feasible way for improving performance. These representations are typically high dimensional and assume diverse forms. Thus finding a way to transform them into a unified space of lower dimension generally facilitates the underlying tasks, such as object recognition or clustering. We describe an approach that incorporates multiple kernel learning with dimensionality reduction (MKL-DR). While the proposed framework is flexible in simultaneously tackling data in various feature representations, the formulation itself is general in that it is established upon graph embedding. It follows that any dimensionality reduction techniques explainable by graph embedding can be generalized by our method to consider data in multiple feature representations.
5 0.072570994 113 nips-2008-Kernelized Sorting
Author: Novi Quadrianto, Le Song, Alex J. Smola
Abstract: Object matching is a fundamental operation in data analysis. It typically requires the definition of a similarity measure between the classes of objects to be matched. Instead, we develop an approach which is able to perform matching by requiring a similarity measure only within each of the classes. This is achieved by maximizing the dependency between matched pairs of observations by means of the Hilbert Schmidt Independence Criterion. This problem can be cast as one of maximizing a quadratic assignment problem with special structure and we present a simple algorithm for finding a locally optimal solution. 1
6 0.070761248 62 nips-2008-Differentiable Sparse Coding
7 0.070197314 200 nips-2008-Robust Kernel Principal Component Analysis
8 0.06835866 114 nips-2008-Large Margin Taxonomy Embedding for Document Categorization
9 0.066002175 194 nips-2008-Regularized Learning with Networks of Features
10 0.063856274 12 nips-2008-Accelerating Bayesian Inference over Nonlinear Differential Equations with Gaussian Processes
11 0.06167867 51 nips-2008-Convergence and Rate of Convergence of a Manifold-Based Dimension Reduction Algorithm
12 0.061110899 64 nips-2008-DiscLDA: Discriminative Learning for Dimensionality Reduction and Classification
13 0.060493309 90 nips-2008-Gaussian-process factor analysis for low-dimensional single-trial analysis of neural population activity
14 0.057795465 196 nips-2008-Relative Margin Machines
15 0.057659503 31 nips-2008-Bayesian Exponential Family PCA
16 0.054981958 157 nips-2008-Nonrigid Structure from Motion in Trajectory Space
17 0.054515947 79 nips-2008-Exploring Large Feature Spaces with Hierarchical Multiple Kernel Learning
18 0.053393479 238 nips-2008-Theory of matching pursuit
19 0.051867288 231 nips-2008-Temporal Dynamics of Cognitive Control
20 0.050546203 246 nips-2008-Unsupervised Learning of Visual Sense Models for Polysemous Words
topicId topicWeight
[(0, -0.186), (1, -0.048), (2, 0.023), (3, -0.014), (4, 0.052), (5, -0.028), (6, 0.001), (7, 0.002), (8, 0.016), (9, -0.015), (10, 0.108), (11, -0.083), (12, 0.011), (13, 0.098), (14, 0.04), (15, -0.059), (16, 0.049), (17, 0.043), (18, 0.069), (19, -0.01), (20, -0.068), (21, -0.049), (22, 0.061), (23, -0.03), (24, -0.008), (25, -0.032), (26, -0.049), (27, 0.059), (28, -0.139), (29, 0.046), (30, -0.018), (31, 0.034), (32, -0.001), (33, 0.061), (34, 0.009), (35, 0.002), (36, 0.086), (37, 0.2), (38, -0.049), (39, -0.094), (40, 0.036), (41, -0.055), (42, 0.094), (43, -0.017), (44, 0.016), (45, -0.001), (46, 0.021), (47, -0.152), (48, 0.033), (49, 0.015)]
simIndex simValue paperId paperTitle
same-paper 1 0.93591207 61 nips-2008-Diffeomorphic Dimensionality Reduction
Author: Christian Walder, Bernhard Schölkopf
Abstract: This paper introduces a new approach to constructing meaningful lower dimensional representations of sets of data points. We argue that constraining the mapping between the high and low dimensional spaces to be a diffeomorphism is a natural way of ensuring that pairwise distances are approximately preserved. Accordingly we develop an algorithm which diffeomorphically maps the data near to a lower dimensional subspace and then projects onto that subspace. The problem of solving for the mapping is transformed into one of solving for an Eulerian flow field which we compute using ideas from kernel methods. We demonstrate the efficacy of our approach on various real world data sets. 1
2 0.72974378 248 nips-2008-Using matrices to model symbolic relationship
Author: Ilya Sutskever, Geoffrey E. Hinton
Abstract: We describe a way of learning matrix representations of objects and relationships. The goal of learning is to allow multiplication of matrices to represent symbolic relationships between objects and symbolic relationships between relationships, which is the main novelty of the method. We demonstrate that this leads to excellent generalization in two different domains: modular arithmetic and family relationships. We show that the same system can learn first-order propositions such as (2, 5) ∈ +3 or (Christopher, Penelope) ∈ has wife, and higher-order propositions such as (3, +3) ∈ plus and (+3, −3) ∈ inverse or (has husband, has wife) ∈ higher oppsex. We further demonstrate that the system understands how higher-order propositions are related to first-order ones by showing that it can correctly answer questions about first-order propositions involving the relations +3 or has wife even though it has not been trained on any first-order examples involving these relations. 1
3 0.67164534 227 nips-2008-Supervised Exponential Family Principal Component Analysis via Convex Optimization
Author: Yuhong Guo
Abstract: Recently, supervised dimensionality reduction has been gaining attention, owing to the realization that data labels are often available and indicate important underlying structure in the data. In this paper, we present a novel convex supervised dimensionality reduction approach based on exponential family PCA, which is able to avoid the local optima of typical EM learning. Moreover, by introducing a sample-based approximation to exponential family models, it overcomes the limitation of the prevailing Gaussian assumptions of standard PCA, and produces a kernelized formulation for nonlinear supervised dimensionality reduction. A training algorithm is then devised based on a subgradient bundle method, whose scalability can be gained using a coordinate descent procedure. The advantage of our global optimization approach is demonstrated by empirical results over both synthetic and real data. 1
4 0.57032722 200 nips-2008-Robust Kernel Principal Component Analysis
Author: Minh H. Nguyen, Fernando Torre
Abstract: Kernel Principal Component Analysis (KPCA) is a popular generalization of linear PCA that allows non-linear feature extraction. In KPCA, data in the input space is mapped to higher (usually) dimensional feature space where the data can be linearly modeled. The feature space is typically induced implicitly by a kernel function, and linear PCA in the feature space is performed via the kernel trick. However, due to the implicitness of the feature space, some extensions of PCA such as robust PCA cannot be directly generalized to KPCA. This paper presents a technique to overcome this problem, and extends it to a unified framework for treating noise, missing data, and outliers in KPCA. Our method is based on a novel cost function to perform inference in KPCA. Extensive experiments, in both synthetic and real data, show that our algorithm outperforms existing methods. 1
5 0.54963672 126 nips-2008-Localized Sliced Inverse Regression
Author: Qiang Wu, Sayan Mukherjee, Feng Liang
Abstract: We developed localized sliced inverse regression for supervised dimension reduction. It has the advantages of preventing degeneracy, increasing estimation accuracy, and automatic subclass discovery in classification problems. A semisupervised version is proposed for the use of unlabeled data. The utility is illustrated on simulated as well as real data sets.
6 0.53980809 31 nips-2008-Bayesian Exponential Family PCA
7 0.50231576 144 nips-2008-Multi-resolution Exploration in Continuous Spaces
8 0.48923856 90 nips-2008-Gaussian-process factor analysis for low-dimensional single-trial analysis of neural population activity
9 0.47417879 146 nips-2008-Multi-task Gaussian Process Learning of Robot Inverse Dynamics
10 0.45599771 63 nips-2008-Dimensionality Reduction for Data in Multiple Feature Representations
11 0.43294576 213 nips-2008-Sparse Convolved Gaussian Processes for Multi-output Regression
12 0.4296065 68 nips-2008-Efficient Direct Density Ratio Estimation for Non-stationarity Adaptation and Outlier Detection
13 0.42520699 157 nips-2008-Nonrigid Structure from Motion in Trajectory Space
14 0.4243322 238 nips-2008-Theory of matching pursuit
15 0.41901073 173 nips-2008-Optimization on a Budget: A Reinforcement Learning Approach
16 0.4170391 125 nips-2008-Local Gaussian Process Regression for Real Time Online Model Learning
17 0.40680787 113 nips-2008-Kernelized Sorting
18 0.40525666 3 nips-2008-A Massively Parallel Digital Learning Processor
19 0.39647135 188 nips-2008-QUIC-SVD: Fast SVD Using Cosine Trees
20 0.39526093 114 nips-2008-Large Margin Taxonomy Embedding for Document Categorization
topicId topicWeight
[(6, 0.037), (7, 0.071), (12, 0.037), (28, 0.12), (57, 0.058), (59, 0.013), (63, 0.018), (77, 0.537), (83, 0.03)]
simIndex simValue paperId paperTitle
Author: Christoph Kolodziejski, Bernd Porr, Minija Tamosiunaite, Florentin Wörgötter
Abstract: In this theoretical contribution we provide mathematical proof that two of the most important classes of network learning - correlation-based differential Hebbian learning and reward-based temporal difference learning - are asymptotically equivalent when timing the learning with a local modulatory signal. This opens the opportunity to consistently reformulate most of the abstract reinforcement learning framework from a correlation based perspective that is more closely related to the biophysics of neurons. 1
same-paper 2 0.8807497 61 nips-2008-Diffeomorphic Dimensionality Reduction
Author: Christian Walder, Bernhard Schölkopf
Abstract: This paper introduces a new approach to constructing meaningful lower dimensional representations of sets of data points. We argue that constraining the mapping between the high and low dimensional spaces to be a diffeomorphism is a natural way of ensuring that pairwise distances are approximately preserved. Accordingly we develop an algorithm which diffeomorphically maps the data near to a lower dimensional subspace and then projects onto that subspace. The problem of solving for the mapping is transformed into one of solving for an Eulerian flow field which we compute using ideas from kernel methods. We demonstrate the efficacy of our approach on various real world data sets. 1
3 0.86063814 122 nips-2008-Learning with Consistency between Inductive Functions and Kernels
Author: Haixuan Yang, Irwin King, Michael Lyu
Abstract: Regularized Least Squares (RLS) algorithms have the ability to avoid over-fitting problems and to express solutions as kernel expansions. However, we observe that the current RLS algorithms cannot provide a satisfactory interpretation even on the penalty of a constant function. Based on the intuition that a good kernelbased inductive function should be consistent with both the data and the kernel, a novel learning scheme is proposed. The advantages of this scheme lie in its corresponding Representer Theorem, its strong interpretation ability about what kind of functions should not be penalized, and its promising accuracy improvements shown in a number of experiments. Furthermore, we provide a detailed technical description about heat kernels, which serves as an example for the readers to apply similar techniques for other kernels. Our work provides a preliminary step in a new direction to explore the varying consistency between inductive functions and kernels under various distributions. 1
4 0.84871495 144 nips-2008-Multi-resolution Exploration in Continuous Spaces
Author: Ali Nouri, Michael L. Littman
Abstract: The essence of exploration is acting to try to decrease uncertainty. We propose a new methodology for representing uncertainty in continuous-state control problems. Our approach, multi-resolution exploration (MRE), uses a hierarchical mapping to identify regions of the state space that would benefit from additional samples. We demonstrate MRE’s broad utility by using it to speed up learning in a prototypical model-based and value-based reinforcement-learning method. Empirical results show that MRE improves upon state-of-the-art exploration approaches. 1
5 0.69838798 216 nips-2008-Sparse probabilistic projections
Author: Cédric Archambeau, Francis R. Bach
Abstract: We present a generative model for performing sparse probabilistic projections, which includes sparse principal component analysis and sparse canonical correlation analysis as special cases. Sparsity is enforced by means of automatic relevance determination or by imposing appropriate prior distributions, such as generalised hyperbolic distributions. We derive a variational Expectation-Maximisation algorithm for the estimation of the hyperparameters and show that our novel probabilistic approach compares favourably to existing techniques. We illustrate how the proposed method can be applied in the context of cryptoanalysis as a preprocessing tool for the construction of template attacks. 1
6 0.57012749 87 nips-2008-Fitted Q-iteration by Advantage Weighted Regression
7 0.56793249 150 nips-2008-Near-optimal Regret Bounds for Reinforcement Learning
8 0.51348895 49 nips-2008-Clusters and Coarse Partitions in LP Relaxations
9 0.50681752 37 nips-2008-Biasing Approximate Dynamic Programming with a Lower Discount Factor
10 0.49757928 230 nips-2008-Temporal Difference Based Actor Critic Learning - Convergence and Neural Implementation
11 0.49296606 200 nips-2008-Robust Kernel Principal Component Analysis
12 0.48447108 44 nips-2008-Characteristic Kernels on Groups and Semigroups
13 0.47810769 215 nips-2008-Sparse Signal Recovery Using Markov Random Fields
14 0.47139773 195 nips-2008-Regularized Policy Iteration
15 0.45878187 238 nips-2008-Theory of matching pursuit
16 0.45540422 192 nips-2008-Reducing statistical dependencies in natural signals using radial Gaussianization
17 0.45403099 240 nips-2008-Tracking Changing Stimuli in Continuous Attractor Neural Networks
18 0.44946367 173 nips-2008-Optimization on a Budget: A Reinforcement Learning Approach
19 0.44925949 227 nips-2008-Supervised Exponential Family Principal Component Analysis via Convex Optimization
20 0.44704208 40 nips-2008-Bounds on marginal probability distributions