nips nips2001 nips2001-106 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Mikhail Belkin, Partha Niyogi
Abstract: Drawing on the correspondence between the graph Laplacian, the Laplace-Beltrami op erator on a manifold , and the connections to the heat equation , we propose a geometrically motivated algorithm for constructing a representation for data sampled from a low dimensional manifold embedded in a higher dimensional space. The algorithm provides a computationally efficient approach to nonlinear dimensionality reduction that has locality preserving properties and a natural connection to clustering. Several applications are considered. In many areas of artificial intelligence, information retrieval and data mining, one is often confronted with intrinsically low dimensional data lying in a very high dimensional space. For example, gray scale n x n images of a fixed object taken with a moving camera yield data points in rn: n2 . However , the intrinsic dimensionality of the space of all images of t he same object is the number of degrees of freedom of the camera - in fact the space has the natural structure of a manifold embedded in rn: n2 . While there is a large body of work on dimensionality reduction in general, most existing approaches do not explicitly take into account the structure of the manifold on which the data may possibly reside. Recently, there has been some interest (Tenenbaum et aI, 2000 ; Roweis and Saul, 2000) in the problem of developing low dimensional representations of data in this particular context. In this paper , we present a new algorithm and an accompanying framework of analysis for geometrically motivated dimensionality reduction. The core algorithm is very simple, has a few local computations and one sparse eigenvalu e problem. The solution reflects th e intrinsic geom etric structure of the manifold. Th e justification comes from the role of the Laplacian op erator in providing an optimal emb edding. Th e Laplacian of the graph obtained from the data points may be viewed as an approximation to the Laplace-Beltrami operator defined on the manifold. The emb edding maps for the data come from approximations to a natural map that is defined on the entire manifold. The framework of analysis presented here makes this connection explicit. While this connection is known to geometers and specialists in sp ectral graph theory (for example , see [1, 2]) to the best of our knowledge we do not know of any application to data representation yet. The connection of the Laplacian to the heat kernel enables us to choose the weights of the graph in a principled manner. The locality preserving character of the Laplacian Eigenmap algorithm makes it relatively insensitive to outliers and noise. A byproduct of this is that the algorithm implicitly emphasizes the natural clusters in the data. Connections to spectral clustering algorithms developed in learning and computer vision (see Shi and Malik , 1997) become very clear. Following the discussion of Roweis and Saul (2000) , and Tenenbaum et al (2000), we note that the biological perceptual apparatus is confronted with high dimensional stimuli from which it must recover low dimensional structure. One might argue that if the approach to recovering such low-dimensional structure is inherently local , then a natural clustering will emerge and thus might serve as the basis for the development of categories in biological perception. 1 The Algorithm Given k points Xl , ... , Xk in ]]{ I, we construct a weighted graph with k nodes, one for each point , and the set of edges connecting neighboring points to each other. 1. Step 1. [Constru cting th e Graph] We put an edge between nodes i and j if Xi and Xj are
Reference: text
sentIndex sentText sentNum sentScore
1 The algorithm provides a computationally efficient approach to nonlinear dimensionality reduction that has locality preserving properties and a natural connection to clustering. [sent-8, score-0.319]
2 In many areas of artificial intelligence, information retrieval and data mining, one is often confronted with intrinsically low dimensional data lying in a very high dimensional space. [sent-10, score-0.405]
3 For example, gray scale n x n images of a fixed object taken with a moving camera yield data points in rn: n2 . [sent-11, score-0.168]
4 However , the intrinsic dimensionality of the space of all images of t he same object is the number of degrees of freedom of the camera - in fact the space has the natural structure of a manifold embedded in rn: n2 . [sent-12, score-0.362]
5 While there is a large body of work on dimensionality reduction in general, most existing approaches do not explicitly take into account the structure of the manifold on which the data may possibly reside. [sent-13, score-0.261]
6 Recently, there has been some interest (Tenenbaum et aI, 2000 ; Roweis and Saul, 2000) in the problem of developing low dimensional representations of data in this particular context. [sent-14, score-0.19]
7 In this paper , we present a new algorithm and an accompanying framework of analysis for geometrically motivated dimensionality reduction. [sent-15, score-0.204]
8 The solution reflects th e intrinsic geom etric structure of the manifold. [sent-17, score-0.118]
9 Th e justification comes from the role of the Laplacian op erator in providing an optimal emb edding. [sent-18, score-0.345]
10 Th e Laplacian of the graph obtained from the data points may be viewed as an approximation to the Laplace-Beltrami operator defined on the manifold. [sent-19, score-0.291]
11 The emb edding maps for the data come from approximations to a natural map that is defined on the entire manifold. [sent-20, score-0.188]
12 The framework of analysis presented here makes this connection explicit. [sent-21, score-0.05]
13 While this connection is known to geometers and specialists in sp ectral graph theory (for example , see [1, 2]) to the best of our knowledge we do not know of any application to data representation yet. [sent-22, score-0.364]
14 The connection of the Laplacian to the heat kernel enables us to choose the weights of the graph in a principled manner. [sent-23, score-0.622]
15 The locality preserving character of the Laplacian Eigenmap algorithm makes it relatively insensitive to outliers and noise. [sent-24, score-0.122]
16 A byproduct of this is that the algorithm implicitly emphasizes the natural clusters in the data. [sent-25, score-0.036]
17 Connections to spectral clustering algorithms developed in learning and computer vision (see Shi and Malik , 1997) become very clear. [sent-26, score-0.103]
18 Following the discussion of Roweis and Saul (2000) , and Tenenbaum et al (2000), we note that the biological perceptual apparatus is confronted with high dimensional stimuli from which it must recover low dimensional structure. [sent-27, score-0.405]
19 One might argue that if the approach to recovering such low-dimensional structure is inherently local , then a natural clustering will emerge and thus might serve as the basis for the development of categories in biological perception. [sent-28, score-0.072]
20 , Xk in ]]{ I, we construct a weighted graph with k nodes, one for each point , and the set of edges connecting neighboring points to each other. [sent-32, score-0.298]
21 [Constru cting th e Graph] We put an edge between nodes i and j if Xi and Xj are "close" . [sent-35, score-0.204]
22 [parameter [ E ]]{] Nodes i and j are connected by an edge if Ilxi - Xj 112 < f. [sent-37, score-0.167]
23 Advantages: geometrically motivated , the relationship is naturally symmetric. [sent-38, score-0.138]
24 Disadvantages : often leads to graphs with several connected components , difficult to choose f. [sent-39, score-0.17]
25 [parameter n E 1'::1] Nodes i and j are connected by an edge if i is among n nearest neighbors of j or j is among n nearest neighbors of i. [sent-41, score-0.355]
26 Advantages: simpler to choose, t ends to lead to connected graphs. [sent-42, score-0.116]
27 If nodes i and j are connected, put Wij = e- Ilxi-X i 11 2 t The justification for this choice of weights will be provided later. [sent-48, score-0.192]
28 W ij = 1 if and only if vertices i an d j are connected by an edge. [sent-51, score-0.272]
29 [Eigenmaps] Assume the graph G, constructed above, is connected , otherwise proceed with Step 3 for each connected component . [sent-55, score-0.39]
30 : '~ _8 L-~~~~----~ -5 -4L-~ o __ ~ o -2 ____ ~ 2 Figure 1: The left panel shows a horizontal and a vertical bar. [sent-62, score-0.193]
31 The middle panel is a two dimensional representation of the set of all images using the Laplacian eigenmaps. [sent-63, score-0.261]
32 The right panel shows the result of a principal components analysis using the first two principal directions to represent the data. [sent-64, score-0.048]
33 Dots correspond to vertical bars and '+' signs correspond to horizontal bars. [sent-65, score-0.201]
34 Compute eigenvalues and eigenvectors for the generalized eigenvector problem: Ly = )'Dy (1) where D is diagonal weight matrix , its entries are column (or row, since W is symmetric) sums of W , Dii = Lj Wji. [sent-66, score-0.074]
35 Laplacian is a symmetric , positive semidefinite matrix which can be thought of as an operator on functions defined on vertices of G. [sent-68, score-0.26]
36 , Y k -1 be the solutions of equation 1, ordered according to their eigenvalues with Yo having the smallest eigenvalue (in fact 0). [sent-72, score-0.058]
37 The image of X i under the embedding into the lower dimensional space :Il{ m is given by (y 1 ( i) , . [sent-73, score-0.279]
38 2 Justification Recall that given a data set we construct a weighted graph G = (V, E) with edges connecting nearby points to each other . [sent-77, score-0.298]
39 Consider the problem of mapping the weighted connected graph G to a line so that connected points stay as close together as possible. [sent-78, score-0.448]
40 We wish to choose Yi E :Il{ to minimize 2)Yi - Yj )2Wij i ,j under appropriate constraints. [sent-79, score-0.054]
41 ,Yn)T be the map from the graph to the real line. [sent-83, score-0.202]
42 Thus Li ,j(Yi - Yj)2Wij can b e written as 2)Y; i ,j + yJ - 2YiYj )Wij = LY; Dii + L yJ Djj j 1S symmetric and 2 LYiYj W ij i ,j = 2yT Ly -b O -. [sent-86, score-0.065]
43 Matrix D provides a natural measure on the vertices of the graph. [sent-99, score-0.127]
44 2, we see that L is a positive semidefinite matrix and the vector y that minimizes the objective function is given by the minimum eigenvalue solution to the generalized eigenvalue problem Ly = )'Dy. [sent-101, score-0.21]
45 Figure 2: 300 most frequent words of the Brown corpus represented in the spectral domain. [sent-103, score-0.226]
46 It is easy to see that 1 is an eigenvector with eigenvalue O. [sent-105, score-0.132]
47 If the graph is connected , 1 is the only eigenvector for ). [sent-106, score-0.348]
48 To eliminate this trivial solution which collapses all vertices of G onto the real number 1, we put an additional constraint of orthogonality to obtain Yopt = argmm yTDy=l yT Ly yTDl=O Thus, the solution Y opt is now given by the eigenvector with the smallest non-zero eigenvalue. [sent-108, score-0.218]
49 More generally, the embedding of the graph into lR! [sent-109, score-0.284]
50 Yml where the ith row, denoted by provides the embedding coordinates of the ith vertex. [sent-114, score-0.126]
51 Thus we need to minimize Yl, L IIYi - 1j 11 2Wi j = tr(yT LY) i ,j This reduces now to Yopt = argminY TDY=I tr(yT LY) For the one-dimensional embedding problem , the constraint prevents collapse onto a point. [sent-115, score-0.22]
52 For the m-dimensional embedding problem , the constraint presented above prevents collapse onto a subspace of dimension less than m. [sent-116, score-0.22]
53 1 The Laplace-Beltrami Operator The Laplacian of a graph is analogous to the Laplace-Beltrami operator on manifolds. [sent-118, score-0.233]
54 Consider a smooth m-dimensional manifold M emb edded in \ lR k. [sent-119, score-0.258]
55 The Riemannian structure (metric tensor) on the manifold is induced by the ,/ \ standard Riemannian structure on lR k. [sent-120, score-0.15]
56 The gradient V f( x) (which in local coordinates can be written as Vf( x) = 2::7=1 is a vector field on the manifold, Figure 4: 685 speech datapoints plotted in the two such that for small ox (in a dimensional Laplacian sp ectral representation. [sent-122, score-0.38]
57 ) If(x + ox) - f(x)1 ~ I(Vf(x) , ox)1 ~ IIVf1111ox11 Thus we see that if IIV fll is small , points near x will be mapp ed to points near f( x). [sent-124, score-0.116]
58 We therefore look for a map that best preserves locality on average by trying to find Minimizing f IIVf(x)11 2 corresponds directly to minimizing Lf M f j )2W ij on a graph. [sent-125, score-0.189]
59 = ~ 2::ij (li ' Minimizing the squared gradient reduces to finding eigen- fun ctions of the Laplace-Beltrami op erator. [sent-126, score-0.08]
60 c is positive semidefinite and the be an eigenfunction of . [sent-135, score-0.094]
61 2 f that minimizes fM IIV fl12 has to H eat K ernels and the Choice of W e ight Matrix The Laplace-Beltrami operator on differentiable functions on a manifold M is intimately related to the heat flow. [sent-138, score-0.585]
62 Let f : M ----+ lR be the initial heat distribution, u(x, t) be the heat distribution at time t (u(x , O) = f( x) ). [sent-139, score-0.72]
63 The heat equation is the partial differential equation ~~ = £u. [sent-140, score-0.36]
64 The solution is given by u(x , t) = fM Ht(x, y)f(y) where H t is the heat kernel - the Green 's function for this PDE. [sent-141, score-0.36]
65 Therefore, Locally, the heat kernel is approximately equal to the Gaussian , Ht(x, y) n Ilx-yl12 4(47rt)-"2e-- t ~ . [sent-142, score-0.36]
66 where Ilx - yll (x and yare m lo cal coordmates) and tare both sufficiently small and n = dim M. [sent-144, score-0.071]
67 Notice that as t tends to 0, the heat kernel Ht(x , y) becomes increasingly lo calized and tends to Dirac's b-function, i. [sent-145, score-0.431]
68 , Xk ~ 1 n -I, [ f(x) - (47rt)-"2 ( ] JM l l x -Yl1 2 e- - -f(y)dy 4t are data points on M, the last expression can be approximated by Xj O< IIX j -X ill < IIX j -X ill« have to worry about a , since the graph Laplacian L will choose the correct multiplier for us. [sent-151, score-0.314]
69 Finally we see how to choose the edge weights for the adjacency matrix W: if Ilxi - Xj II < otherwise 3 f Examples Exalllple 1 - A Toy Vision Exalllple: Consider binary images of vertical and horizontal bars lo cated at arbitrary points in the 40 x 40 visual field. [sent-152, score-0.495]
70 We choose 1000 images, each containing either a vertical or a horizontal bar (500 containing vertical bars and 500 horizontal bars) at random. [sent-153, score-0.4]
71 2 shows the results of an experiment conducted with the 300 most frequent words in the Brown corpus - a collection of t exts containing about a million words available in electronic format. [sent-157, score-0.199]
72 Each word is represented as a vector in a 600 dimensional space using information about the frequency of its left and right neighbors (computed from th e bigram statistics of the corpus). [sent-158, score-0.249]
73 4 we consider the low dimensional representations arising from applying the Laplacian Eigenmap algorithm to a sentence of sp eech - ge l p- P}jf= J3'~ _o. [sent-160, score-0.346]
74 Note that points marked with the same symbol may arise from occurrences of the same phoneme at different points in the utterance. [sent-166, score-0.116]
75 The symbol "sh" stands for the fricative in the word she; "aa" ," ao" stand for vowels in the words dark and all respectively; "kcl" ," dcl" ," gcl" stand for closures preceding the stop consonants "k" ," d" ," g" respectively. [sent-167, score-0.327]
76 Each vector is labeled according to the identity of the phonetic segment it belonged to. [sent-171, score-0.057]
77 4 shows the speech data points plotted in the two dimensional Laplacian representation. [sent-173, score-0.211]
78 The two "spokes" correspond predominantly to fricatives and closures respectively. [sent-174, score-0.072]
79 Saul, Non lin ear Dimensionality Reduction by Locally Linear Embedding, Science, vol 290 , 22 Dec. [sent-188, score-0.068]
80 2000 , [5] Jianbo Shi , Jitendra Malik , Norma lized Cuts and Image Segmentation , IEEE Transactions on PAMI, vol 22, no 8, August 2000 [6] J. [sent-189, score-0.068]
81 Langford, A Global G eom etric Framework for Nonlinear Dimensionality Reduction, Science, Vol 290, 22 Dec . [sent-194, score-0.072]
wordName wordTfidf (topN-words)
[('laplacian', 0.4), ('heat', 0.36), ('ly', 0.159), ('graph', 0.158), ('dimensional', 0.153), ('manifold', 0.15), ('exalllple', 0.144), ('lr', 0.142), ('fm', 0.142), ('embedding', 0.126), ('connected', 0.116), ('eigenmaps', 0.108), ('emb', 0.108), ('yj', 0.1), ('geometrically', 0.1), ('dii', 0.094), ('div', 0.094), ('semidefinite', 0.094), ('vertices', 0.091), ('ht', 0.091), ('iiv', 0.085), ('justification', 0.085), ('sp', 0.084), ('op', 0.08), ('locality', 0.08), ('riemannian', 0.08), ('yt', 0.076), ('horizontal', 0.075), ('operator', 0.075), ('corpus', 0.074), ('eigenvector', 0.074), ('closures', 0.072), ('disadvantages', 0.072), ('ectral', 0.072), ('eech', 0.072), ('eigenmap', 0.072), ('erator', 0.072), ('etric', 0.072), ('ilxi', 0.072), ('yopt', 0.072), ('ox', 0.071), ('lo', 0.071), ('roweis', 0.071), ('vertical', 0.07), ('vol', 0.068), ('spectral', 0.067), ('dimensionality', 0.066), ('tenenbaum', 0.065), ('wij', 0.065), ('ij', 0.065), ('saul', 0.063), ('confronted', 0.062), ('chung', 0.062), ('images', 0.06), ('eigenvalue', 0.058), ('points', 0.058), ('brown', 0.057), ('stand', 0.057), ('phonetic', 0.057), ('vowels', 0.057), ('malik', 0.057), ('fan', 0.057), ('bars', 0.056), ('xj', 0.056), ('nodes', 0.054), ('choose', 0.054), ('collapse', 0.053), ('vf', 0.053), ('yo', 0.053), ('iix', 0.053), ('chicago', 0.053), ('lj', 0.053), ('put', 0.053), ('il', 0.051), ('edge', 0.051), ('connection', 0.05), ('neighbors', 0.05), ('dy', 0.05), ('camera', 0.05), ('shi', 0.05), ('panel', 0.048), ('yi', 0.047), ('th', 0.046), ('frequent', 0.045), ('reduction', 0.045), ('map', 0.044), ('nearest', 0.044), ('stands', 0.044), ('ill', 0.044), ('edges', 0.043), ('fourier', 0.042), ('preserving', 0.042), ('prevents', 0.041), ('words', 0.04), ('mathematics', 0.039), ('connecting', 0.039), ('motivated', 0.038), ('low', 0.037), ('clustering', 0.036), ('natural', 0.036)]
simIndex simValue paperId paperTitle
same-paper 1 0.99999934 106 nips-2001-Laplacian Eigenmaps and Spectral Techniques for Embedding and Clustering
Author: Mikhail Belkin, Partha Niyogi
Abstract: Drawing on the correspondence between the graph Laplacian, the Laplace-Beltrami op erator on a manifold , and the connections to the heat equation , we propose a geometrically motivated algorithm for constructing a representation for data sampled from a low dimensional manifold embedded in a higher dimensional space. The algorithm provides a computationally efficient approach to nonlinear dimensionality reduction that has locality preserving properties and a natural connection to clustering. Several applications are considered. In many areas of artificial intelligence, information retrieval and data mining, one is often confronted with intrinsically low dimensional data lying in a very high dimensional space. For example, gray scale n x n images of a fixed object taken with a moving camera yield data points in rn: n2 . However , the intrinsic dimensionality of the space of all images of t he same object is the number of degrees of freedom of the camera - in fact the space has the natural structure of a manifold embedded in rn: n2 . While there is a large body of work on dimensionality reduction in general, most existing approaches do not explicitly take into account the structure of the manifold on which the data may possibly reside. Recently, there has been some interest (Tenenbaum et aI, 2000 ; Roweis and Saul, 2000) in the problem of developing low dimensional representations of data in this particular context. In this paper , we present a new algorithm and an accompanying framework of analysis for geometrically motivated dimensionality reduction. The core algorithm is very simple, has a few local computations and one sparse eigenvalu e problem. The solution reflects th e intrinsic geom etric structure of the manifold. Th e justification comes from the role of the Laplacian op erator in providing an optimal emb edding. Th e Laplacian of the graph obtained from the data points may be viewed as an approximation to the Laplace-Beltrami operator defined on the manifold. The emb edding maps for the data come from approximations to a natural map that is defined on the entire manifold. The framework of analysis presented here makes this connection explicit. While this connection is known to geometers and specialists in sp ectral graph theory (for example , see [1, 2]) to the best of our knowledge we do not know of any application to data representation yet. The connection of the Laplacian to the heat kernel enables us to choose the weights of the graph in a principled manner. The locality preserving character of the Laplacian Eigenmap algorithm makes it relatively insensitive to outliers and noise. A byproduct of this is that the algorithm implicitly emphasizes the natural clusters in the data. Connections to spectral clustering algorithms developed in learning and computer vision (see Shi and Malik , 1997) become very clear. Following the discussion of Roweis and Saul (2000) , and Tenenbaum et al (2000), we note that the biological perceptual apparatus is confronted with high dimensional stimuli from which it must recover low dimensional structure. One might argue that if the approach to recovering such low-dimensional structure is inherently local , then a natural clustering will emerge and thus might serve as the basis for the development of categories in biological perception. 1 The Algorithm Given k points Xl , ... , Xk in ]]{ I, we construct a weighted graph with k nodes, one for each point , and the set of edges connecting neighboring points to each other. 1. Step 1. [Constru cting th e Graph] We put an edge between nodes i and j if Xi and Xj are
2 0.15552862 170 nips-2001-Spectral Kernel Methods for Clustering
Author: Nello Cristianini, John Shawe-Taylor, Jaz S. Kandola
Abstract: In this paper we introduce new algorithms for unsupervised learning based on the use of a kernel matrix. All the information required by such algorithms is contained in the eigenvectors of the matrix or of closely related matrices. We use two different but related cost functions, the Alignment and the 'cut cost'. The first one is discussed in a companion paper [3], the second one is based on graph theoretic concepts. Both functions measure the level of clustering of a labeled dataset, or the correlation between data clusters and labels. We state the problem of unsupervised learning as assigning labels so as to optimize these cost functions. We show how the optimal solution can be approximated by slightly relaxing the corresponding optimization problem, and how this corresponds to using eigenvector information. The resulting simple algorithms are tested on real world data with positive results. 1
3 0.13835773 135 nips-2001-On Spectral Clustering: Analysis and an algorithm
Author: Andrew Y. Ng, Michael I. Jordan, Yair Weiss
Abstract: Despite many empirical successes of spectral clustering methodsalgorithms that cluster points using eigenvectors of matrices derived from the data- there are several unresolved issues. First, there are a wide variety of algorithms that use the eigenvectors in slightly different ways. Second, many of these algorithms have no proof that they will actually compute a reasonable clustering. In this paper, we present a simple spectral clustering algorithm that can be implemented using a few lines of Matlab. Using tools from matrix perturbation theory, we analyze the algorithm, and give conditions under which it can be expected to do well. We also show surprisingly good experimental results on a number of challenging clustering problems. 1
4 0.13271107 88 nips-2001-Grouping and dimensionality reduction by locally linear embedding
Author: Marzia Polito, Pietro Perona
Abstract: Locally Linear Embedding (LLE) is an elegant nonlinear dimensionality-reduction technique recently introduced by Roweis and Saul [2]. It fails when the data is divided into separate groups. We study a variant of LLE that can simultaneously group the data and calculate local embedding of each group. An estimate for the upper bound on the intrinsic dimension of the data set is obtained automatically. 1
5 0.12864316 84 nips-2001-Global Coordination of Local Linear Models
Author: Sam T. Roweis, Lawrence K. Saul, Geoffrey E. Hinton
Abstract: High dimensional data that lies on or near a low dimensional manifold can be described by a collection of local linear models. Such a description, however, does not provide a global parameterization of the manifold—arguably an important goal of unsupervised learning. In this paper, we show how to learn a collection of local linear models that solves this more difficult problem. Our local linear models are represented by a mixture of factor analyzers, and the “global coordination” of these models is achieved by adding a regularizing term to the standard maximum likelihood objective function. The regularizer breaks a degeneracy in the mixture model’s parameter space, favoring models whose internal coordinate systems are aligned in a consistent way. As a result, the internal coordinates change smoothly and continuously as one traverses a connected path on the manifold—even when the path crosses the domains of many different local models. The regularizer takes the form of a Kullback-Leibler divergence and illustrates an unexpected application of variational methods: not to perform approximate inference in intractable probabilistic models, but to learn more useful internal representations in tractable ones. 1 Manifold Learning Consider an ensemble of images, each of which contains a face against a neutral background. Each image can be represented by a point in the high dimensional vector space of pixel intensities. This representation, however, does not exploit the strong correlations between pixels of the same image, nor does it support many useful operations for reasoning about faces. If, for example, we select two images with faces in widely different locations and then average their pixel intensities, we do not obtain an image of a face at their average location. Images of faces lie on or near a low-dimensional, curved manifold, and we can represent them more usefully by the coordinates on this manifold than by pixel intensities. Using these “intrinsic coordinates”, the average of two faces is another face with the average of their locations, poses and expressions. To analyze and manipulate faces, it is helpful to imagine a “magic black box” with levers or dials corresponding to the intrinsic coordinates on this manifold. Given a setting of the levers and dials, the box generates an image of a face. Given an image of a face, the box deduces the appropriate setting of the levers and dials. In this paper, we describe a fairly general way to construct such a box automatically from an ensemble of high-dimensional vectors. We assume only that there exists an underlying manifold of low dimensionality and that the relationship between the raw data and the manifold coordinates is locally linear and smoothly varying. Thus our method applies not only to images of faces, but also to many other forms of highly distributed perceptual and scientific data (e.g., spectrograms of speech, robotic sensors, gene expression arrays, document collections). 2 Local Linear Models The global structure of perceptual manifolds (such as images of faces) tends to be highly nonlinear. Fortunately, despite their complicated global structure, we can usually characterize these manifolds as locally linear. Thus, to a good approximation, they can be represented by collections of simpler models, each of which describes a locally linear neighborhood[3, 6, 8]. For unsupervised learning tasks, a probabilistic model that nicely captures this intuition is a mixture of factor analyzers (MFA)[5]. The model is used to describe high dimensional data that lies on or near a lower dimensional manifold. MFAs parameterize a joint distribution over observed and hidden variables: (1) where the observed variable, , represents the high dimensional data; the discrete hidden variables, , indexes different neighborhoods on the manifold; and the continuous hidden variables, , represent low dimensional local coordinates. The model assumes that data is sampled from different neighborhoods on the manifold with prior probabilities , and that within each neighborhood, the data’s local coordinates are normally distributed1 as: RP&¤§¢ Q ¡ I 0 ( 3HGF D C¥@@@¥ 8¥75 ( E¨BAA9¨©©64§ 2 0 ( 31)£ ¥ ¡ ¡ ¥ ¡ ¥ §¥ ¡ &¤§¢'&§ %#¤¢$#¨
6 0.10254436 136 nips-2001-On the Concentration of Spectral Properties
7 0.096185736 89 nips-2001-Grouping with Bias
8 0.093841806 29 nips-2001-Adaptive Sparseness Using Jeffreys Prior
9 0.075259365 144 nips-2001-Partially labeled classification with Markov random walks
10 0.071385235 171 nips-2001-Spectral Relaxation for K-means Clustering
11 0.065298714 164 nips-2001-Sampling Techniques for Kernel Methods
12 0.064991236 101 nips-2001-K-Local Hyperplane and Convex Distance Nearest Neighbor Algorithms
13 0.064737387 185 nips-2001-The Method of Quantum Clustering
14 0.062960386 118 nips-2001-Matching Free Trees with Replicator Equations
15 0.062526308 74 nips-2001-Face Recognition Using Kernel Methods
16 0.058986716 13 nips-2001-A Natural Policy Gradient
17 0.057732146 15 nips-2001-A New Discriminative Kernel From Probabilistic Models
18 0.057500217 103 nips-2001-Kernel Feature Spaces and Nonlinear Blind Souce Separation
19 0.056807727 60 nips-2001-Discriminative Direction for Kernel Classifiers
20 0.055843808 9 nips-2001-A Generalization of Principal Components Analysis to the Exponential Family
topicId topicWeight
[(0, -0.193), (1, 0.031), (2, -0.034), (3, -0.167), (4, 0.068), (5, 0.003), (6, -0.13), (7, -0.02), (8, -0.007), (9, 0.019), (10, 0.05), (11, 0.023), (12, 0.034), (13, 0.102), (14, -0.015), (15, -0.107), (16, -0.133), (17, -0.112), (18, 0.014), (19, 0.035), (20, -0.108), (21, -0.019), (22, 0.137), (23, 0.068), (24, -0.061), (25, -0.071), (26, 0.04), (27, -0.057), (28, -0.047), (29, 0.089), (30, 0.047), (31, 0.083), (32, -0.041), (33, 0.069), (34, -0.004), (35, -0.074), (36, 0.002), (37, 0.046), (38, -0.044), (39, 0.048), (40, -0.174), (41, 0.023), (42, -0.039), (43, 0.142), (44, -0.114), (45, -0.045), (46, 0.008), (47, -0.101), (48, 0.005), (49, 0.108)]
simIndex simValue paperId paperTitle
same-paper 1 0.96190399 106 nips-2001-Laplacian Eigenmaps and Spectral Techniques for Embedding and Clustering
Author: Mikhail Belkin, Partha Niyogi
Abstract: Drawing on the correspondence between the graph Laplacian, the Laplace-Beltrami op erator on a manifold , and the connections to the heat equation , we propose a geometrically motivated algorithm for constructing a representation for data sampled from a low dimensional manifold embedded in a higher dimensional space. The algorithm provides a computationally efficient approach to nonlinear dimensionality reduction that has locality preserving properties and a natural connection to clustering. Several applications are considered. In many areas of artificial intelligence, information retrieval and data mining, one is often confronted with intrinsically low dimensional data lying in a very high dimensional space. For example, gray scale n x n images of a fixed object taken with a moving camera yield data points in rn: n2 . However , the intrinsic dimensionality of the space of all images of t he same object is the number of degrees of freedom of the camera - in fact the space has the natural structure of a manifold embedded in rn: n2 . While there is a large body of work on dimensionality reduction in general, most existing approaches do not explicitly take into account the structure of the manifold on which the data may possibly reside. Recently, there has been some interest (Tenenbaum et aI, 2000 ; Roweis and Saul, 2000) in the problem of developing low dimensional representations of data in this particular context. In this paper , we present a new algorithm and an accompanying framework of analysis for geometrically motivated dimensionality reduction. The core algorithm is very simple, has a few local computations and one sparse eigenvalu e problem. The solution reflects th e intrinsic geom etric structure of the manifold. Th e justification comes from the role of the Laplacian op erator in providing an optimal emb edding. Th e Laplacian of the graph obtained from the data points may be viewed as an approximation to the Laplace-Beltrami operator defined on the manifold. The emb edding maps for the data come from approximations to a natural map that is defined on the entire manifold. The framework of analysis presented here makes this connection explicit. While this connection is known to geometers and specialists in sp ectral graph theory (for example , see [1, 2]) to the best of our knowledge we do not know of any application to data representation yet. The connection of the Laplacian to the heat kernel enables us to choose the weights of the graph in a principled manner. The locality preserving character of the Laplacian Eigenmap algorithm makes it relatively insensitive to outliers and noise. A byproduct of this is that the algorithm implicitly emphasizes the natural clusters in the data. Connections to spectral clustering algorithms developed in learning and computer vision (see Shi and Malik , 1997) become very clear. Following the discussion of Roweis and Saul (2000) , and Tenenbaum et al (2000), we note that the biological perceptual apparatus is confronted with high dimensional stimuli from which it must recover low dimensional structure. One might argue that if the approach to recovering such low-dimensional structure is inherently local , then a natural clustering will emerge and thus might serve as the basis for the development of categories in biological perception. 1 The Algorithm Given k points Xl , ... , Xk in ]]{ I, we construct a weighted graph with k nodes, one for each point , and the set of edges connecting neighboring points to each other. 1. Step 1. [Constru cting th e Graph] We put an edge between nodes i and j if Xi and Xj are
2 0.78638726 88 nips-2001-Grouping and dimensionality reduction by locally linear embedding
Author: Marzia Polito, Pietro Perona
Abstract: Locally Linear Embedding (LLE) is an elegant nonlinear dimensionality-reduction technique recently introduced by Roweis and Saul [2]. It fails when the data is divided into separate groups. We study a variant of LLE that can simultaneously group the data and calculate local embedding of each group. An estimate for the upper bound on the intrinsic dimension of the data set is obtained automatically. 1
3 0.54249364 136 nips-2001-On the Concentration of Spectral Properties
Author: John Shawe-Taylor, Nello Cristianini, Jaz S. Kandola
Abstract: We consider the problem of measuring the eigenvalues of a randomly drawn sample of points. We show that these values can be reliably estimated as can the sum of the tail of eigenvalues. Furthermore, the residuals when data is projected into a subspace is shown to be reliably estimated on a random sample. Experiments are presented that confirm the theoretical results. 1
4 0.52075219 84 nips-2001-Global Coordination of Local Linear Models
Author: Sam T. Roweis, Lawrence K. Saul, Geoffrey E. Hinton
Abstract: High dimensional data that lies on or near a low dimensional manifold can be described by a collection of local linear models. Such a description, however, does not provide a global parameterization of the manifold—arguably an important goal of unsupervised learning. In this paper, we show how to learn a collection of local linear models that solves this more difficult problem. Our local linear models are represented by a mixture of factor analyzers, and the “global coordination” of these models is achieved by adding a regularizing term to the standard maximum likelihood objective function. The regularizer breaks a degeneracy in the mixture model’s parameter space, favoring models whose internal coordinate systems are aligned in a consistent way. As a result, the internal coordinates change smoothly and continuously as one traverses a connected path on the manifold—even when the path crosses the domains of many different local models. The regularizer takes the form of a Kullback-Leibler divergence and illustrates an unexpected application of variational methods: not to perform approximate inference in intractable probabilistic models, but to learn more useful internal representations in tractable ones. 1 Manifold Learning Consider an ensemble of images, each of which contains a face against a neutral background. Each image can be represented by a point in the high dimensional vector space of pixel intensities. This representation, however, does not exploit the strong correlations between pixels of the same image, nor does it support many useful operations for reasoning about faces. If, for example, we select two images with faces in widely different locations and then average their pixel intensities, we do not obtain an image of a face at their average location. Images of faces lie on or near a low-dimensional, curved manifold, and we can represent them more usefully by the coordinates on this manifold than by pixel intensities. Using these “intrinsic coordinates”, the average of two faces is another face with the average of their locations, poses and expressions. To analyze and manipulate faces, it is helpful to imagine a “magic black box” with levers or dials corresponding to the intrinsic coordinates on this manifold. Given a setting of the levers and dials, the box generates an image of a face. Given an image of a face, the box deduces the appropriate setting of the levers and dials. In this paper, we describe a fairly general way to construct such a box automatically from an ensemble of high-dimensional vectors. We assume only that there exists an underlying manifold of low dimensionality and that the relationship between the raw data and the manifold coordinates is locally linear and smoothly varying. Thus our method applies not only to images of faces, but also to many other forms of highly distributed perceptual and scientific data (e.g., spectrograms of speech, robotic sensors, gene expression arrays, document collections). 2 Local Linear Models The global structure of perceptual manifolds (such as images of faces) tends to be highly nonlinear. Fortunately, despite their complicated global structure, we can usually characterize these manifolds as locally linear. Thus, to a good approximation, they can be represented by collections of simpler models, each of which describes a locally linear neighborhood[3, 6, 8]. For unsupervised learning tasks, a probabilistic model that nicely captures this intuition is a mixture of factor analyzers (MFA)[5]. The model is used to describe high dimensional data that lies on or near a lower dimensional manifold. MFAs parameterize a joint distribution over observed and hidden variables: (1) where the observed variable, , represents the high dimensional data; the discrete hidden variables, , indexes different neighborhoods on the manifold; and the continuous hidden variables, , represent low dimensional local coordinates. The model assumes that data is sampled from different neighborhoods on the manifold with prior probabilities , and that within each neighborhood, the data’s local coordinates are normally distributed1 as: RP&¤§¢ Q ¡ I 0 ( 3HGF D C¥@@@¥ 8¥75 ( E¨BAA9¨©©64§ 2 0 ( 31)£ ¥ ¡ ¡ ¥ ¡ ¥ §¥ ¡ &¤§¢'&§ %#¤¢$#¨
5 0.50349659 89 nips-2001-Grouping with Bias
Author: Stella X. Yu, Jianbo Shi
Abstract: With the optimization of pattern discrimination as a goal, graph partitioning approaches often lack the capability to integrate prior knowledge to guide grouping. In this paper, we consider priors from unitary generative models, partially labeled data and spatial attention. These priors are modelled as constraints in the solution space. By imposing uniformity condition on the constraints, we restrict the feasible space to one of smooth solutions. A subspace projection method is developed to solve this constrained eigenproblema We demonstrate that simple priors can greatly improve image segmentation results. 1
6 0.45646963 135 nips-2001-On Spectral Clustering: Analysis and an algorithm
7 0.40657184 170 nips-2001-Spectral Kernel Methods for Clustering
8 0.39114523 144 nips-2001-Partially labeled classification with Markov random walks
9 0.38703611 101 nips-2001-K-Local Hyperplane and Convex Distance Nearest Neighbor Algorithms
10 0.38163406 110 nips-2001-Learning Hierarchical Structures with Linear Relational Embedding
11 0.3693535 171 nips-2001-Spectral Relaxation for K-means Clustering
12 0.35703576 149 nips-2001-Probabilistic Abstraction Hierarchies
13 0.35037103 185 nips-2001-The Method of Quantum Clustering
14 0.32857725 118 nips-2001-Matching Free Trees with Replicator Equations
15 0.32327834 19 nips-2001-A Rotation and Translation Invariant Discrete Saliency Network
16 0.31214353 9 nips-2001-A Generalization of Principal Components Analysis to the Exponential Family
17 0.30343661 193 nips-2001-Unsupervised Learning of Human Motion Models
18 0.29724693 82 nips-2001-Generating velocity tuning by asymmetric recurrent connections
19 0.29518384 60 nips-2001-Discriminative Direction for Kernel Classifiers
20 0.29410329 93 nips-2001-Incremental A*
topicId topicWeight
[(14, 0.02), (17, 0.028), (19, 0.015), (27, 0.652), (30, 0.05), (38, 0.015), (59, 0.028), (72, 0.035), (79, 0.025), (91, 0.072)]
simIndex simValue paperId paperTitle
1 0.99422455 117 nips-2001-MIME: Mutual Information Minimization and Entropy Maximization for Bayesian Belief Propagation
Author: Anand Rangarajan, Alan L. Yuille
Abstract: Bayesian belief propagation in graphical models has been recently shown to have very close ties to inference methods based in statistical physics. After Yedidia et al. demonstrated that belief propagation fixed points correspond to extrema of the so-called Bethe free energy, Yuille derived a double loop algorithm that is guaranteed to converge to a local minimum of the Bethe free energy. Yuille’s algorithm is based on a certain decomposition of the Bethe free energy and he mentions that other decompositions are possible and may even be fruitful. In the present work, we begin with the Bethe free energy and show that it has a principled interpretation as pairwise mutual information minimization and marginal entropy maximization (MIME). Next, we construct a family of free energy functions from a spectrum of decompositions of the original Bethe free energy. For each free energy in this family, we develop a new algorithm that is guaranteed to converge to a local minimum. Preliminary computer simulations are in agreement with this theoretical development. 1
2 0.98696417 165 nips-2001-Scaling Laws and Local Minima in Hebbian ICA
Author: Magnus Rattray, Gleb Basalyga
Abstract: We study the dynamics of a Hebbian ICA algorithm extracting a single non-Gaussian component from a high-dimensional Gaussian background. For both on-line and batch learning we find that a surprisingly large number of examples are required to avoid trapping in a sub-optimal state close to the initial conditions. To extract a skewed signal at least examples are required for -dimensional data and examples are required to extract a symmetrical signal with non-zero kurtosis. § ¡ ©£¢ £ §¥ ¡ ¨¦¤£¢
3 0.98554724 129 nips-2001-Multiplicative Updates for Classification by Mixture Models
Author: Lawrence K. Saul, Daniel D. Lee
Abstract: We investigate a learning algorithm for the classification of nonnegative data by mixture models. Multiplicative update rules are derived that directly optimize the performance of these models as classifiers. The update rules have a simple closed form and an intuitive appeal. Our algorithm retains the main virtues of the Expectation-Maximization (EM) algorithm—its guarantee of monotonic improvement, and its absence of tuning parameters—with the added advantage of optimizing a discriminative objective function. The algorithm reduces as a special case to the method of generalized iterative scaling for log-linear models. The learning rate of the algorithm is controlled by the sparseness of the training data. We use the method of nonnegative matrix factorization (NMF) to discover sparse distributed representations of the data. This form of feature selection greatly accelerates learning and makes the algorithm practical on large problems. Experiments show that discriminatively trained mixture models lead to much better classification than comparably sized models trained by EM. 1
same-paper 4 0.98478931 106 nips-2001-Laplacian Eigenmaps and Spectral Techniques for Embedding and Clustering
Author: Mikhail Belkin, Partha Niyogi
Abstract: Drawing on the correspondence between the graph Laplacian, the Laplace-Beltrami op erator on a manifold , and the connections to the heat equation , we propose a geometrically motivated algorithm for constructing a representation for data sampled from a low dimensional manifold embedded in a higher dimensional space. The algorithm provides a computationally efficient approach to nonlinear dimensionality reduction that has locality preserving properties and a natural connection to clustering. Several applications are considered. In many areas of artificial intelligence, information retrieval and data mining, one is often confronted with intrinsically low dimensional data lying in a very high dimensional space. For example, gray scale n x n images of a fixed object taken with a moving camera yield data points in rn: n2 . However , the intrinsic dimensionality of the space of all images of t he same object is the number of degrees of freedom of the camera - in fact the space has the natural structure of a manifold embedded in rn: n2 . While there is a large body of work on dimensionality reduction in general, most existing approaches do not explicitly take into account the structure of the manifold on which the data may possibly reside. Recently, there has been some interest (Tenenbaum et aI, 2000 ; Roweis and Saul, 2000) in the problem of developing low dimensional representations of data in this particular context. In this paper , we present a new algorithm and an accompanying framework of analysis for geometrically motivated dimensionality reduction. The core algorithm is very simple, has a few local computations and one sparse eigenvalu e problem. The solution reflects th e intrinsic geom etric structure of the manifold. Th e justification comes from the role of the Laplacian op erator in providing an optimal emb edding. Th e Laplacian of the graph obtained from the data points may be viewed as an approximation to the Laplace-Beltrami operator defined on the manifold. The emb edding maps for the data come from approximations to a natural map that is defined on the entire manifold. The framework of analysis presented here makes this connection explicit. While this connection is known to geometers and specialists in sp ectral graph theory (for example , see [1, 2]) to the best of our knowledge we do not know of any application to data representation yet. The connection of the Laplacian to the heat kernel enables us to choose the weights of the graph in a principled manner. The locality preserving character of the Laplacian Eigenmap algorithm makes it relatively insensitive to outliers and noise. A byproduct of this is that the algorithm implicitly emphasizes the natural clusters in the data. Connections to spectral clustering algorithms developed in learning and computer vision (see Shi and Malik , 1997) become very clear. Following the discussion of Roweis and Saul (2000) , and Tenenbaum et al (2000), we note that the biological perceptual apparatus is confronted with high dimensional stimuli from which it must recover low dimensional structure. One might argue that if the approach to recovering such low-dimensional structure is inherently local , then a natural clustering will emerge and thus might serve as the basis for the development of categories in biological perception. 1 The Algorithm Given k points Xl , ... , Xk in ]]{ I, we construct a weighted graph with k nodes, one for each point , and the set of edges connecting neighboring points to each other. 1. Step 1. [Constru cting th e Graph] We put an edge between nodes i and j if Xi and Xj are
5 0.98390162 133 nips-2001-On Discriminative vs. Generative Classifiers: A comparison of logistic regression and naive Bayes
Author: Andrew Y. Ng, Michael I. Jordan
Abstract: We compare discriminative and generative learning as typified by logistic regression and naive Bayes. We show, contrary to a widelyheld belief that discriminative classifiers are almost always to be preferred, that there can often be two distinct regimes of performance as the training set size is increased, one in which each algorithm does better. This stems from the observation- which is borne out in repeated experiments- that while discriminative learning has lower asymptotic error, a generative classifier may also approach its (higher) asymptotic error much faster. 1
6 0.98214638 23 nips-2001-A theory of neural integration in the head-direction system
7 0.97191119 47 nips-2001-Causal Categorization with Bayes Nets
8 0.89062083 98 nips-2001-Information Geometrical Framework for Analyzing Belief Propagation Decoder
9 0.85848707 9 nips-2001-A Generalization of Principal Components Analysis to the Exponential Family
10 0.82393503 137 nips-2001-On the Convergence of Leveraging
11 0.77368426 103 nips-2001-Kernel Feature Spaces and Nonlinear Blind Souce Separation
12 0.76593214 127 nips-2001-Multi Dimensional ICA to Separate Correlated Sources
13 0.7658608 97 nips-2001-Information-Geometrical Significance of Sparsity in Gallager Codes
14 0.76164377 81 nips-2001-Generalization Performance of Some Learning Problems in Hilbert Functional Spaces
15 0.76042777 8 nips-2001-A General Greedy Approximation Algorithm with Applications
16 0.75426233 88 nips-2001-Grouping and dimensionality reduction by locally linear embedding
17 0.75075084 114 nips-2001-Learning from Infinite Data in Finite Time
18 0.74672067 154 nips-2001-Products of Gaussians
19 0.74501002 197 nips-2001-Why Neuronal Dynamics Should Control Synaptic Learning Rules
20 0.7432341 31 nips-2001-Algorithmic Luckiness