nips nips2001 nips2001-88 knowledge-graph by maker-knowledge-mining

88 nips-2001-Grouping and dimensionality reduction by locally linear embedding


Source: pdf

Author: Marzia Polito, Pietro Perona

Abstract: Locally Linear Embedding (LLE) is an elegant nonlinear dimensionality-reduction technique recently introduced by Roweis and Saul [2]. It fails when the data is divided into separate groups. We study a variant of LLE that can simultaneously group the data and calculate local embedding of each group. An estimate for the upper bound on the intrinsic dimension of the data set is obtained automatically. 1

Reference: text


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 Grouping and dimensionality reduction by locally linear embedding Marzia Polito Division of Physics, Mathematics and Astronomy California Institute of Technology Pasadena, CA, 91125 polito @caltech. [sent-1, score-0.41]

2 It fails when the data is divided into separate groups. [sent-4, score-0.064]

3 We study a variant of LLE that can simultaneously group the data and calculate local embedding of each group. [sent-5, score-0.192]

4 An estimate for the upper bound on the intrinsic dimension of the data set is obtained automatically. [sent-6, score-0.224]

5 1 Introduction Consider a collection of N data points Xi E ]RD. [sent-7, score-0.104]

6 Suppose that , while the dimension D is large, we have independent information suggesting that the data are distributed on a manifold of dimension d < < D. [sent-8, score-0.351]

7 Principal component analysis (PCA) is a classical technique which works well when the data lie close to a flat manifold [1]. [sent-11, score-0.283]

8 Elegant methods for dealing with data that is distributed on curved manifolds have been recently proposed [3, 2]. [sent-12, score-0.103]

9 While LLE is not designed to handle data that are disconnected, i. [sent-14, score-0.064]

10 Furthermore, both the number of groups and the upper bound on the intrinsic dimension of the data may be estimated automatically, rather than being given a-priori. [sent-17, score-0.349]

11 2 Locally linear embedding The key insight inspiring LLE is that, while the data may not lie close to a globally linear manifold, it may be approximately locally linear, and in this case each point may be approximated as a linear combination of its nearest neighbors. [sent-18, score-0.523]

12 The coefficients of this linear combination carries the vital information for constructing a lower-dimensional linear embedding. [sent-19, score-0.08]

13 The local linear structure can be easily encoded in a sparse N by N matrix W, proceeding as follows. [sent-24, score-0.089]

14 The first step is to choose a criterion to determine the neighbors of each point. [sent-25, score-0.059]

15 Roweis and Saul chose an integer number K and pick, for every point, the K points nearest to it. [sent-26, score-0.079]

16 For each point Xi then, they determine the linear combination of its neighbors which best approximates the point itself. [sent-27, score-0.099]

17 The coefficients of such linear combinations are computed by minimizing the quadratic cost function: f(W) = L N IXi - L WijXj 12 (1) j=1 while enforcing the constraints W ij = 0 if Xj is not a neighbor of Xi , and L:. [sent-28, score-0.106]

18 f=1 WijXj lies in the affine subspace generated by the K nearest neighbors of Xi, and that the solution W is translation-invariant . [sent-30, score-0.098]

19 ,N of points in ]Rd, reproducing as faithfully as possible the local linear structure encoded in W. [sent-35, score-0.08]

20 This is done minimizing a cost function N (Y) = N L IYi - L Wij Yjl2 (2) i=1 j =1 To ensure the uniqueness of the solution two constraint are imposed: translation invariance by placing the center of gravity of the data in the origin, i. [sent-36, score-0.232]

21 Given d, consider the d + 1 eigenvectors associated to the d + 1 smallest eigenvalues of the matrix M. [sent-43, score-0.537]

22 The rows of the matrix Y whose columns are given by such d eigenvectors give the desired solution. [sent-45, score-0.225]

23 The first eigenvector is discarded because it is a vector composed of all ones, with 0 as eigenvalue. [sent-46, score-0.08]

24 As we shall see, this is true when the data set is 'connected' . [sent-47, score-0.064]

25 1 Disjoint components In LLE every data point has a set of K neighbors. [sent-49, score-0.129]

26 This allows us to partition of the whole data set X into K -connected components, corresponding to the intuitive visual notion of different 'groups' in the data set. [sent-50, score-0.222]

27 We say that a partition X = UiUi is finer than a partition X = Uj 10 if every Ui is contained in some 10. [sent-51, score-0.188]

28 The partition in K -connected components is the finest : •. [sent-52, score-0.187]

29 10 20 30 40 50 60 70 60 90 100 -020"'---------:::---;:;;-----O;---;;;------;';c------:::--~ Figure 1: (Top-left) 2D data Xi distributed along a curve (the index i increases from left to right for convenience). [sent-80, score-0.3]

30 The x axis represents the index i and the y axis represents Yi. [sent-82, score-0.18]

31 (Bottom-left) As above, the data is now disconnected, i. [sent-84, score-0.064]

32 (Bottom-right) One-dimensional LLE calculated on the data (different symbols used for points belonging to the different groups). [sent-87, score-0.104]

33 Notice that the Yi's are not a good representation of the data any longer since they are constant within each group. [sent-88, score-0.064]

34 partition of the data set such that if two points have at least one neighbor in common, or one is a neighbor of the other, then they belong to the same component. [sent-89, score-0.33]

35 Note that for any two points in the same component, we can find an ordered sequence of points having them as endpoints, such that two consecutive points have at least one neighbor in common. [sent-90, score-0.219]

36 Consider data that is not K -connected, then LLE does not compute a good parametrization, as illustrated in Figure 1. [sent-92, score-0.064]

37 As an effect of the covariance constraint, the representation curves the line, the Figure 2: Coordinates Yi calculated for data Xi distributed along a straight line in ]RD = ]R3 when the dimension d is chosen as d = 1 (Left), and d = 2 (Right). [sent-102, score-0.336]

38 The index i is indicated along the x axis (Left) and along the 2D curve (Right). [sent-103, score-0.217]

39 curvature can be very high, and even locally we possibly completely lose the linear structure. [sent-104, score-0.225]

40 PCA provides a principled way of estimating the intrinsic dimensionality of the data: it corresponds to the number of large singular values of the covariance matrix of the data. [sent-107, score-0.193]

41 3 Dimensionality detection: the size of the eigenvalues In the example of Figure 2 the two dimensional representation of the data (d = 2) is clearly the 'wrong' one, since the data lie in a one-dimensional linear subspace. [sent-109, score-0.544]

42 In this case the unit covariance constraint in minimizing the function (Y) is not compatible with the linear structure. [sent-110, score-0.11]

43 The answer is that d + 1 should be less or equal to the number of eigenvalues of M that are close to zero. [sent-112, score-0.312]

44 Assume that the data Xi E ]RD is K -connected and that it is locally fiat, i. [sent-114, score-0.203]

45 there exists a corresponding set Yi E ]Rd for some d > 0 such that Yi = L: j Wij}j (zero-error approximation), the set {Yi} has rank d, and has the origin as center of gravity: L:~1 Yi = the matrix M. [sent-116, score-0.123]

46 Call z the number of zero eigenvalues of Proof. [sent-119, score-0.312]

47 Moreover, since the Yi are such that the addends of have zero error, then the matrix Y , which by hypothesis has rank d, is in the kernel of I - W and hence in the kernel of M. [sent-121, score-0.077]

48 Due to the center of gravity constraint, all the columns of Y are orthogonal to the all 1 's vector. [sent-122, score-0.168]

49 D Therefore, in order to estimate d, one may count the number z of zero eigenvalues of M and choose any d < z. [sent-124, score-0.342]

50 when the data are not exactly locally fiat , and when one has to contend with numerical noise? [sent-130, score-0.289]

51 The appendix provides an argument showing that the statement in the proposition is robust with respect to ,,' ,, ' ,, ' ,, ' ,, ' ,,' ,,' ,, ' ,,' ,, ' ,, ' 10 ' 0 10" ,,' 2nd e igen value 10" 2nd c igc nvul uc 10 " 10 " 10" lst eige nva] ue 10" 10" 0 0 lst . [sent-131, score-0.312]

52 , c i'cnv¡ "' " Figure 3: (Left) Eigenvalues for the straight-line data Xi used for Figure 2. [sent-132, score-0.064]

53 (Right) Eigenvalues for the curve data shown in the top-left panel of Figure 1. [sent-133, score-0.092]

54 In both cases the two last eigenvalue are orders of magnitude smaller than the other eigenvalues, indicating a maximal dimension d = 1 for the data. [sent-134, score-0.231]

55 numerical errors and small deviations from the ideal locally flat data will result in small deviations from the ideal zero-value of the first d + 1 eigenvalues, where d is used here for the 'intrinsic' dimension of the data. [sent-137, score-0.633]

56 In Figure 4 we describe the successful application of the dimensionality detection method on a data set of synthetically generated grayscale images. [sent-139, score-0.212]

57 1) we pointed out the limits of LLE when applied to multiple components of data. [sent-141, score-0.065]

58 It appears then that a grouping procedure should always preceed LLE. [sent-142, score-0.085]

59 The data would be first split into its component groups, each one of which should be then analyzed with LLE. [sent-143, score-0.128]

60 A deeper analysis of the algorithm though, suggests that grouping and LLE could actually be performed at the same time. [sent-144, score-0.085]

61 Then there exists an m-dimensional eigenspace of M with zero eigenvalue which admits a basis {vih=l,. [sent-150, score-0.162]

62 More precisely: each Vi corresponds to one of the groups of the data and takes value V i ,j = 1 for j in the group, V i ,j = 0 for j not in the group. [sent-154, score-0.189]

63 Without loss of generality, assume that the indexing of the data X i is such that the weight matrix W , and consequent ely the matrix M, are block-diagonal with m blocks, each block corresponding to one of the groups of data. [sent-156, score-0.343]

64 As a direct consequence of the row normalization of W, each block of M has exactly one eigenvector composed of all ones, with eigenvalue O. [sent-158, score-0.186]

65 Therefore, there is an m-dimensional eigenspace with eigenvalue 0, and there exist a basis of it, each vector of which has value 1 on a certain component, 0 otherwise. [sent-159, score-0.162]

66 D Therefore one may count the number of connected components by computing the eigenvectors of M corresponding to eigenvalue 0, and counting the number m of those vectors Vi whose components take few discrete values (see Figure 6). [sent-160, score-0.451]

67 Each index i may be assigned to a group by clustering based on the value of Vl, . [sent-161, score-0.058]

68 Figure 4: (Left) A sample from a data set of N=1000, 40 by 40 grayscale images, each one thought as a point in a 1600 dimensional vector space. [sent-165, score-0.153]

69 In each image, a slightly blurred line separates a dark from a bright portion. [sent-166, score-0.128]

70 The orientation of the line and its distance from the center of the image are variable. [sent-167, score-0.122]

71 The 2nd and 3rd smallest eigenvalues are of smaller size than the others, giving an upper bound of 2 on the intrinsic dimension of the data set. [sent-170, score-0.536]

72 The polar coordinates, after rescaling, are the distance of the dividing line from the center and its orientation. [sent-172, score-0.15]

73 Figure 5: The data set is analogous to the one used above (N =1000, 40 by 40 grayscale images, LLE performed with K=20). [sent-174, score-0.122]

74 The orientation of the line dividing the dark from the bright portion is now only allowed to vary in two disjoint intervals. [sent-175, score-0.248]

75 4th and 6th) eigenvectors of M are used for the LLE representation of the first (resp. [sent-178, score-0.176]

76 2nd and 3rd eigenvalues Figure 6: (Left) The last six eigenvectors of M for the broken parabola of Figure 1 shown, top to bottom, in reverse order of magnitude of the corresponding eigenvalue. [sent-182, score-0.555]

77 The eigenvectors corresponding to the three last eigenvalues have discrete values indicating that the data is split in three groups. [sent-186, score-0.614]

78 There are z =6 zero-eigenvalues indicating that the dimension of the data is d:::; z/m - 1 = 1. [sent-187, score-0.183]

79 It is also robust to small perturbations of the block-diagonal structure of M (see Figure 7). [sent-189, score-0.095]

80 This makes the use of LLE for grouping purposes convenient. [sent-190, score-0.085]

81 Should the K-connected components be completely separated, the partition would be easily obtained via a more efficient graph-search algorithm. [sent-191, score-0.205]

82 The proof is carried out for ordered indices as in Fig. [sent-192, score-0.066]

83 The analysis of Proposition 1 may be extended to the dimension of each of the m groups according to Proposition 2. [sent-194, score-0.212]

84 Therefore, in the ideal case, we will find z zero-eigenvalues of M which, together with the number m obtained by counting the discrete-valued eigenvectors may be used to estimate the maximal d using z ~ m(d + 1). [sent-195, score-0.325]

85 5 Conclusions We have examined two difficulties of the Locally Linear Embedding method [2] and shown that, in a neighborhood of ideal conditions, they may be solved by a careful exam of eigenvectors of the matrix M that are associated to very small eigenvalues. [sent-197, score-0.303]

86 More specifically: the number of groups in which the data is partitioned corresponds to the number of discrete-valued eigenvectors, while the maximal dimension d of the low-dimensional embedding may be obtained by dividing the number of small eigenvalues by m and subtracting 1. [sent-198, score-0.851]

87 Both the groups and the low-dimensional embedding coordinates may be computed from the components of such eigenvectors. [sent-199, score-0.393]

88 Further investigation on real data sets is necessary in order to validate our theoretical results. [sent-201, score-0.064]

89 w' ,----~-~-~-~--~______, w' w' w' w' w' w' 3rd and 4th eigenvalues . [sent-202, score-0.312]

90 \values o --=-----::------=-------=------,=--~ ~ Figure 7: (Left) 2D Data Xi distributed along a broken parabola. [sent-204, score-0.108]

91 Nevertheless, for K=14, the components are not completely K-disconnected (a different symbol is used for the neighbors of the leftmost point on the rightmost component). [sent-205, score-0.17]

92 A set of two almost-zero eigenvalues and a set of two of small size are visible. [sent-207, score-0.312]

93 Similarly, in Proposition 1 of Section 3 we proved that under ideal conditions (no noise, locally flat data), we can determine an estimate for the intrinsic dimension of the data. [sent-223, score-0.455]

94 Our next goal is to establish a certain robustness of these results in the case there is numerical noise, or the components are not completely separated, or the data is not exactly locally flat . [sent-224, score-0.433]

95 In general, suppose we have a non degenerate matrix A, and an orthonormal basis of eigenvectors VI, . [sent-225, score-0.256]

96 As a consequence of a small perturbation of the matrix into A + dA, we will have eigenvectors Vi + dVi with eigenvalues Ai + dAi' The unitary norm constraint makes sure that dVi is orthogonal to Vi and could be therefore written as dVi = L:k#i O'. [sent-232, score-0.647]

97 Using again the orthonormality, one can derive expressions for the perturbations of Ai and Vi : dAi O'. [sent-234, score-0.095]

98 This shows that if the perturbation dA has order E, then the perturbations dA and are also of order E. [sent-236, score-0.129]

99 Notice that we are not interested in perturbations O'. [sent-237, score-0.095]

100 ij within the eigenspace of eigenvalue 0, but rather those orthogonal to it, and therefore Ai =j:. [sent-238, score-0.2]


similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('lle', 0.641), ('eigenvalues', 0.312), ('yi', 0.18), ('eigenvectors', 0.176), ('locally', 0.139), ('embedding', 0.128), ('groups', 0.125), ('proposition', 0.123), ('roweis', 0.106), ('xi', 0.102), ('dvi', 0.096), ('perturbations', 0.095), ('partition', 0.094), ('dimension', 0.087), ('grouping', 0.085), ('rd', 0.085), ('eigenspace', 0.084), ('gravity', 0.084), ('vi', 0.083), ('eigenvalue', 0.078), ('ideal', 0.078), ('flat', 0.078), ('coordinates', 0.075), ('manifold', 0.074), ('intrinsic', 0.073), ('neighbor', 0.066), ('components', 0.065), ('lst', 0.064), ('polito', 0.064), ('wijxj', 0.064), ('data', 0.064), ('axis', 0.061), ('appendix', 0.061), ('neighbors', 0.059), ('grayscale', 0.058), ('wij', 0.058), ('index', 0.058), ('dividing', 0.056), ('saul', 0.056), ('dai', 0.056), ('parametrization', 0.056), ('da', 0.054), ('ai', 0.052), ('pasadena', 0.051), ('perona', 0.051), ('synthetically', 0.051), ('matrix', 0.049), ('line', 0.048), ('composed', 0.047), ('center', 0.046), ('completely', 0.046), ('separated', 0.045), ('fiat', 0.045), ('partitioned', 0.045), ('bright', 0.045), ('disconnected', 0.045), ('numerical', 0.041), ('ones', 0.041), ('elegant', 0.04), ('left', 0.04), ('points', 0.04), ('linear', 0.04), ('dimensionality', 0.039), ('distributed', 0.039), ('nearest', 0.039), ('orthogonal', 0.038), ('constraint', 0.038), ('counting', 0.037), ('right', 0.036), ('disjoint', 0.036), ('mathematics', 0.035), ('dark', 0.035), ('along', 0.035), ('component', 0.034), ('broken', 0.034), ('deviations', 0.034), ('perturbation', 0.034), ('maximal', 0.034), ('lie', 0.033), ('indices', 0.033), ('eigenvector', 0.033), ('six', 0.033), ('ordered', 0.033), ('covariance', 0.032), ('indicating', 0.032), ('notice', 0.032), ('suppose', 0.031), ('dimensional', 0.031), ('straight', 0.031), ('count', 0.03), ('split', 0.03), ('block', 0.028), ('orientation', 0.028), ('rank', 0.028), ('curve', 0.028), ('onedimensional', 0.028), ('finest', 0.028), ('yih', 0.028), ('astronomy', 0.028), ('consequent', 0.028)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 1.0000001 88 nips-2001-Grouping and dimensionality reduction by locally linear embedding

Author: Marzia Polito, Pietro Perona

Abstract: Locally Linear Embedding (LLE) is an elegant nonlinear dimensionality-reduction technique recently introduced by Roweis and Saul [2]. It fails when the data is divided into separate groups. We study a variant of LLE that can simultaneously group the data and calculate local embedding of each group. An estimate for the upper bound on the intrinsic dimension of the data set is obtained automatically. 1

2 0.21007718 136 nips-2001-On the Concentration of Spectral Properties

Author: John Shawe-Taylor, Nello Cristianini, Jaz S. Kandola

Abstract: We consider the problem of measuring the eigenvalues of a randomly drawn sample of points. We show that these values can be reliably estimated as can the sum of the tail of eigenvalues. Furthermore, the residuals when data is projected into a subspace is shown to be reliably estimated on a random sample. Experiments are presented that confirm the theoretical results. 1

3 0.16463836 135 nips-2001-On Spectral Clustering: Analysis and an algorithm

Author: Andrew Y. Ng, Michael I. Jordan, Yair Weiss

Abstract: Despite many empirical successes of spectral clustering methodsalgorithms that cluster points using eigenvectors of matrices derived from the data- there are several unresolved issues. First, there are a wide variety of algorithms that use the eigenvectors in slightly different ways. Second, many of these algorithms have no proof that they will actually compute a reasonable clustering. In this paper, we present a simple spectral clustering algorithm that can be implemented using a few lines of Matlab. Using tools from matrix perturbation theory, we analyze the algorithm, and give conditions under which it can be expected to do well. We also show surprisingly good experimental results on a number of challenging clustering problems. 1

4 0.16139355 84 nips-2001-Global Coordination of Local Linear Models

Author: Sam T. Roweis, Lawrence K. Saul, Geoffrey E. Hinton

Abstract: High dimensional data that lies on or near a low dimensional manifold can be described by a collection of local linear models. Such a description, however, does not provide a global parameterization of the manifold—arguably an important goal of unsupervised learning. In this paper, we show how to learn a collection of local linear models that solves this more difficult problem. Our local linear models are represented by a mixture of factor analyzers, and the “global coordination” of these models is achieved by adding a regularizing term to the standard maximum likelihood objective function. The regularizer breaks a degeneracy in the mixture model’s parameter space, favoring models whose internal coordinate systems are aligned in a consistent way. As a result, the internal coordinates change smoothly and continuously as one traverses a connected path on the manifold—even when the path crosses the domains of many different local models. The regularizer takes the form of a Kullback-Leibler divergence and illustrates an unexpected application of variational methods: not to perform approximate inference in intractable probabilistic models, but to learn more useful internal representations in tractable ones. 1 Manifold Learning Consider an ensemble of images, each of which contains a face against a neutral background. Each image can be represented by a point in the high dimensional vector space of pixel intensities. This representation, however, does not exploit the strong correlations between pixels of the same image, nor does it support many useful operations for reasoning about faces. If, for example, we select two images with faces in widely different locations and then average their pixel intensities, we do not obtain an image of a face at their average location. Images of faces lie on or near a low-dimensional, curved manifold, and we can represent them more usefully by the coordinates on this manifold than by pixel intensities. Using these “intrinsic coordinates”, the average of two faces is another face with the average of their locations, poses and expressions. To analyze and manipulate faces, it is helpful to imagine a “magic black box” with levers or dials corresponding to the intrinsic coordinates on this manifold. Given a setting of the levers and dials, the box generates an image of a face. Given an image of a face, the box deduces the appropriate setting of the levers and dials. In this paper, we describe a fairly general way to construct such a box automatically from an ensemble of high-dimensional vectors. We assume only that there exists an underlying manifold of low dimensionality and that the relationship between the raw data and the manifold coordinates is locally linear and smoothly varying. Thus our method applies not only to images of faces, but also to many other forms of highly distributed perceptual and scientific data (e.g., spectrograms of speech, robotic sensors, gene expression arrays, document collections). 2 Local Linear Models The global structure of perceptual manifolds (such as images of faces) tends to be highly nonlinear. Fortunately, despite their complicated global structure, we can usually characterize these manifolds as locally linear. Thus, to a good approximation, they can be represented by collections of simpler models, each of which describes a locally linear neighborhood[3, 6, 8]. For unsupervised learning tasks, a probabilistic model that nicely captures this intuition is a mixture of factor analyzers (MFA)[5]. The model is used to describe high dimensional data that lies on or near a lower dimensional manifold. MFAs parameterize a joint distribution over observed and hidden variables: (1) where the observed variable, , represents the high dimensional data; the discrete hidden variables, , indexes different neighborhoods on the manifold; and the continuous hidden variables, , represent low dimensional local coordinates. The model assumes that data is sampled from different neighborhoods on the manifold with prior probabilities , and that within each neighborhood, the data’s local coordinates are normally distributed1 as:  RP&¤§¢  Q  ¡ I 0 (  3HGF D C¥@@@¥ 8¥75 ( E¨BAA9¨©©64§ 2 0 ( 31)£ ¥ ¡    ¡   ¥ ¡     ¥ §¥ ¡ &¤§¢'&§ %#¤¢$#¨

5 0.13271107 106 nips-2001-Laplacian Eigenmaps and Spectral Techniques for Embedding and Clustering

Author: Mikhail Belkin, Partha Niyogi

Abstract: Drawing on the correspondence between the graph Laplacian, the Laplace-Beltrami op erator on a manifold , and the connections to the heat equation , we propose a geometrically motivated algorithm for constructing a representation for data sampled from a low dimensional manifold embedded in a higher dimensional space. The algorithm provides a computationally efficient approach to nonlinear dimensionality reduction that has locality preserving properties and a natural connection to clustering. Several applications are considered. In many areas of artificial intelligence, information retrieval and data mining, one is often confronted with intrinsically low dimensional data lying in a very high dimensional space. For example, gray scale n x n images of a fixed object taken with a moving camera yield data points in rn: n2 . However , the intrinsic dimensionality of the space of all images of t he same object is the number of degrees of freedom of the camera - in fact the space has the natural structure of a manifold embedded in rn: n2 . While there is a large body of work on dimensionality reduction in general, most existing approaches do not explicitly take into account the structure of the manifold on which the data may possibly reside. Recently, there has been some interest (Tenenbaum et aI, 2000 ; Roweis and Saul, 2000) in the problem of developing low dimensional representations of data in this particular context. In this paper , we present a new algorithm and an accompanying framework of analysis for geometrically motivated dimensionality reduction. The core algorithm is very simple, has a few local computations and one sparse eigenvalu e problem. The solution reflects th e intrinsic geom etric structure of the manifold. Th e justification comes from the role of the Laplacian op erator in providing an optimal emb edding. Th e Laplacian of the graph obtained from the data points may be viewed as an approximation to the Laplace-Beltrami operator defined on the manifold. The emb edding maps for the data come from approximations to a natural map that is defined on the entire manifold. The framework of analysis presented here makes this connection explicit. While this connection is known to geometers and specialists in sp ectral graph theory (for example , see [1, 2]) to the best of our knowledge we do not know of any application to data representation yet. The connection of the Laplacian to the heat kernel enables us to choose the weights of the graph in a principled manner. The locality preserving character of the Laplacian Eigenmap algorithm makes it relatively insensitive to outliers and noise. A byproduct of this is that the algorithm implicitly emphasizes the natural clusters in the data. Connections to spectral clustering algorithms developed in learning and computer vision (see Shi and Malik , 1997) become very clear. Following the discussion of Roweis and Saul (2000) , and Tenenbaum et al (2000), we note that the biological perceptual apparatus is confronted with high dimensional stimuli from which it must recover low dimensional structure. One might argue that if the approach to recovering such low-dimensional structure is inherently local , then a natural clustering will emerge and thus might serve as the basis for the development of categories in biological perception. 1 The Algorithm Given k points Xl , ... , Xk in ]]{ I, we construct a weighted graph with k nodes, one for each point , and the set of edges connecting neighboring points to each other. 1. Step 1. [Constru cting th e Graph] We put an edge between nodes i and j if Xi and Xj are

6 0.13220927 164 nips-2001-Sampling Techniques for Kernel Methods

7 0.11755806 170 nips-2001-Spectral Kernel Methods for Clustering

8 0.11539267 171 nips-2001-Spectral Relaxation for K-means Clustering

9 0.094719984 74 nips-2001-Face Recognition Using Kernel Methods

10 0.09415397 89 nips-2001-Grouping with Bias

11 0.093093082 28 nips-2001-Adaptive Nearest Neighbor Classification Using Support Vector Machines

12 0.086293429 48 nips-2001-Characterizing Neural Gain Control using Spike-triggered Covariance

13 0.083119579 101 nips-2001-K-Local Hyperplane and Convex Distance Nearest Neighbor Algorithms

14 0.070093654 117 nips-2001-MIME: Mutual Information Minimization and Entropy Maximization for Bayesian Belief Propagation

15 0.066462658 9 nips-2001-A Generalization of Principal Components Analysis to the Exponential Family

16 0.063398279 62 nips-2001-Duality, Geometry, and Support Vector Regression

17 0.060233217 103 nips-2001-Kernel Feature Spaces and Nonlinear Blind Souce Separation

18 0.057706524 38 nips-2001-Asymptotic Universality for Learning Curves of Support Vector Machines

19 0.057648636 94 nips-2001-Incremental Learning and Selective Sampling via Parametric Optimization Framework for SVM

20 0.057204913 82 nips-2001-Generating velocity tuning by asymmetric recurrent connections


similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, -0.197), (1, 0.059), (2, -0.048), (3, -0.195), (4, 0.165), (5, 0.024), (6, -0.098), (7, 0.031), (8, 0.051), (9, -0.016), (10, 0.078), (11, 0.099), (12, 0.046), (13, 0.064), (14, 0.038), (15, -0.104), (16, -0.073), (17, -0.174), (18, 0.059), (19, 0.038), (20, -0.128), (21, -0.075), (22, 0.235), (23, 0.07), (24, -0.214), (25, -0.075), (26, 0.015), (27, -0.031), (28, -0.01), (29, -0.027), (30, -0.029), (31, 0.081), (32, 0.058), (33, -0.036), (34, -0.021), (35, -0.057), (36, -0.075), (37, 0.037), (38, -0.044), (39, 0.101), (40, -0.014), (41, 0.043), (42, -0.055), (43, 0.084), (44, -0.129), (45, -0.024), (46, -0.028), (47, -0.066), (48, 0.008), (49, 0.083)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.95512682 88 nips-2001-Grouping and dimensionality reduction by locally linear embedding

Author: Marzia Polito, Pietro Perona

Abstract: Locally Linear Embedding (LLE) is an elegant nonlinear dimensionality-reduction technique recently introduced by Roweis and Saul [2]. It fails when the data is divided into separate groups. We study a variant of LLE that can simultaneously group the data and calculate local embedding of each group. An estimate for the upper bound on the intrinsic dimension of the data set is obtained automatically. 1

2 0.7906729 106 nips-2001-Laplacian Eigenmaps and Spectral Techniques for Embedding and Clustering

Author: Mikhail Belkin, Partha Niyogi

Abstract: Drawing on the correspondence between the graph Laplacian, the Laplace-Beltrami op erator on a manifold , and the connections to the heat equation , we propose a geometrically motivated algorithm for constructing a representation for data sampled from a low dimensional manifold embedded in a higher dimensional space. The algorithm provides a computationally efficient approach to nonlinear dimensionality reduction that has locality preserving properties and a natural connection to clustering. Several applications are considered. In many areas of artificial intelligence, information retrieval and data mining, one is often confronted with intrinsically low dimensional data lying in a very high dimensional space. For example, gray scale n x n images of a fixed object taken with a moving camera yield data points in rn: n2 . However , the intrinsic dimensionality of the space of all images of t he same object is the number of degrees of freedom of the camera - in fact the space has the natural structure of a manifold embedded in rn: n2 . While there is a large body of work on dimensionality reduction in general, most existing approaches do not explicitly take into account the structure of the manifold on which the data may possibly reside. Recently, there has been some interest (Tenenbaum et aI, 2000 ; Roweis and Saul, 2000) in the problem of developing low dimensional representations of data in this particular context. In this paper , we present a new algorithm and an accompanying framework of analysis for geometrically motivated dimensionality reduction. The core algorithm is very simple, has a few local computations and one sparse eigenvalu e problem. The solution reflects th e intrinsic geom etric structure of the manifold. Th e justification comes from the role of the Laplacian op erator in providing an optimal emb edding. Th e Laplacian of the graph obtained from the data points may be viewed as an approximation to the Laplace-Beltrami operator defined on the manifold. The emb edding maps for the data come from approximations to a natural map that is defined on the entire manifold. The framework of analysis presented here makes this connection explicit. While this connection is known to geometers and specialists in sp ectral graph theory (for example , see [1, 2]) to the best of our knowledge we do not know of any application to data representation yet. The connection of the Laplacian to the heat kernel enables us to choose the weights of the graph in a principled manner. The locality preserving character of the Laplacian Eigenmap algorithm makes it relatively insensitive to outliers and noise. A byproduct of this is that the algorithm implicitly emphasizes the natural clusters in the data. Connections to spectral clustering algorithms developed in learning and computer vision (see Shi and Malik , 1997) become very clear. Following the discussion of Roweis and Saul (2000) , and Tenenbaum et al (2000), we note that the biological perceptual apparatus is confronted with high dimensional stimuli from which it must recover low dimensional structure. One might argue that if the approach to recovering such low-dimensional structure is inherently local , then a natural clustering will emerge and thus might serve as the basis for the development of categories in biological perception. 1 The Algorithm Given k points Xl , ... , Xk in ]]{ I, we construct a weighted graph with k nodes, one for each point , and the set of edges connecting neighboring points to each other. 1. Step 1. [Constru cting th e Graph] We put an edge between nodes i and j if Xi and Xj are

3 0.75286126 136 nips-2001-On the Concentration of Spectral Properties

Author: John Shawe-Taylor, Nello Cristianini, Jaz S. Kandola

Abstract: We consider the problem of measuring the eigenvalues of a randomly drawn sample of points. We show that these values can be reliably estimated as can the sum of the tail of eigenvalues. Furthermore, the residuals when data is projected into a subspace is shown to be reliably estimated on a random sample. Experiments are presented that confirm the theoretical results. 1

4 0.52179915 135 nips-2001-On Spectral Clustering: Analysis and an algorithm

Author: Andrew Y. Ng, Michael I. Jordan, Yair Weiss

Abstract: Despite many empirical successes of spectral clustering methodsalgorithms that cluster points using eigenvectors of matrices derived from the data- there are several unresolved issues. First, there are a wide variety of algorithms that use the eigenvectors in slightly different ways. Second, many of these algorithms have no proof that they will actually compute a reasonable clustering. In this paper, we present a simple spectral clustering algorithm that can be implemented using a few lines of Matlab. Using tools from matrix perturbation theory, we analyze the algorithm, and give conditions under which it can be expected to do well. We also show surprisingly good experimental results on a number of challenging clustering problems. 1

5 0.50057894 84 nips-2001-Global Coordination of Local Linear Models

Author: Sam T. Roweis, Lawrence K. Saul, Geoffrey E. Hinton

Abstract: High dimensional data that lies on or near a low dimensional manifold can be described by a collection of local linear models. Such a description, however, does not provide a global parameterization of the manifold—arguably an important goal of unsupervised learning. In this paper, we show how to learn a collection of local linear models that solves this more difficult problem. Our local linear models are represented by a mixture of factor analyzers, and the “global coordination” of these models is achieved by adding a regularizing term to the standard maximum likelihood objective function. The regularizer breaks a degeneracy in the mixture model’s parameter space, favoring models whose internal coordinate systems are aligned in a consistent way. As a result, the internal coordinates change smoothly and continuously as one traverses a connected path on the manifold—even when the path crosses the domains of many different local models. The regularizer takes the form of a Kullback-Leibler divergence and illustrates an unexpected application of variational methods: not to perform approximate inference in intractable probabilistic models, but to learn more useful internal representations in tractable ones. 1 Manifold Learning Consider an ensemble of images, each of which contains a face against a neutral background. Each image can be represented by a point in the high dimensional vector space of pixel intensities. This representation, however, does not exploit the strong correlations between pixels of the same image, nor does it support many useful operations for reasoning about faces. If, for example, we select two images with faces in widely different locations and then average their pixel intensities, we do not obtain an image of a face at their average location. Images of faces lie on or near a low-dimensional, curved manifold, and we can represent them more usefully by the coordinates on this manifold than by pixel intensities. Using these “intrinsic coordinates”, the average of two faces is another face with the average of their locations, poses and expressions. To analyze and manipulate faces, it is helpful to imagine a “magic black box” with levers or dials corresponding to the intrinsic coordinates on this manifold. Given a setting of the levers and dials, the box generates an image of a face. Given an image of a face, the box deduces the appropriate setting of the levers and dials. In this paper, we describe a fairly general way to construct such a box automatically from an ensemble of high-dimensional vectors. We assume only that there exists an underlying manifold of low dimensionality and that the relationship between the raw data and the manifold coordinates is locally linear and smoothly varying. Thus our method applies not only to images of faces, but also to many other forms of highly distributed perceptual and scientific data (e.g., spectrograms of speech, robotic sensors, gene expression arrays, document collections). 2 Local Linear Models The global structure of perceptual manifolds (such as images of faces) tends to be highly nonlinear. Fortunately, despite their complicated global structure, we can usually characterize these manifolds as locally linear. Thus, to a good approximation, they can be represented by collections of simpler models, each of which describes a locally linear neighborhood[3, 6, 8]. For unsupervised learning tasks, a probabilistic model that nicely captures this intuition is a mixture of factor analyzers (MFA)[5]. The model is used to describe high dimensional data that lies on or near a lower dimensional manifold. MFAs parameterize a joint distribution over observed and hidden variables: (1) where the observed variable, , represents the high dimensional data; the discrete hidden variables, , indexes different neighborhoods on the manifold; and the continuous hidden variables, , represent low dimensional local coordinates. The model assumes that data is sampled from different neighborhoods on the manifold with prior probabilities , and that within each neighborhood, the data’s local coordinates are normally distributed1 as:  RP&¤§¢  Q  ¡ I 0 (  3HGF D C¥@@@¥ 8¥75 ( E¨BAA9¨©©64§ 2 0 ( 31)£ ¥ ¡    ¡   ¥ ¡     ¥ §¥ ¡ &¤§¢'&§ %#¤¢$#¨

6 0.43500444 171 nips-2001-Spectral Relaxation for K-means Clustering

7 0.41120061 48 nips-2001-Characterizing Neural Gain Control using Spike-triggered Covariance

8 0.40915585 120 nips-2001-Minimax Probability Machine

9 0.40295339 101 nips-2001-K-Local Hyperplane and Convex Distance Nearest Neighbor Algorithms

10 0.39868981 9 nips-2001-A Generalization of Principal Components Analysis to the Exponential Family

11 0.35395911 164 nips-2001-Sampling Techniques for Kernel Methods

12 0.35214671 89 nips-2001-Grouping with Bias

13 0.33191255 92 nips-2001-Incorporating Invariances in Non-Linear Support Vector Machines

14 0.32869649 74 nips-2001-Face Recognition Using Kernel Methods

15 0.32525614 117 nips-2001-MIME: Mutual Information Minimization and Entropy Maximization for Bayesian Belief Propagation

16 0.31560794 154 nips-2001-Products of Gaussians

17 0.31246495 28 nips-2001-Adaptive Nearest Neighbor Classification Using Support Vector Machines

18 0.30832398 170 nips-2001-Spectral Kernel Methods for Clustering

19 0.30079105 83 nips-2001-Geometrical Singularities in the Neuromanifold of Multilayer Perceptrons

20 0.29349667 155 nips-2001-Quantizing Density Estimators


similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(14, 0.049), (17, 0.085), (19, 0.034), (27, 0.197), (30, 0.052), (38, 0.03), (59, 0.056), (72, 0.091), (79, 0.033), (91, 0.115), (98, 0.159)]

similar papers list:

simIndex simValue paperId paperTitle

1 0.94409853 171 nips-2001-Spectral Relaxation for K-means Clustering

Author: Hongyuan Zha, Xiaofeng He, Chris Ding, Ming Gu, Horst D. Simon

Abstract: The popular K-means clustering partitions a data set by minimizing a sum-of-squares cost function. A coordinate descend method is then used to find local minima. In this paper we show that the minimization can be reformulated as a trace maximization problem associated with the Gram matrix of the data vectors. Furthermore, we show that a relaxed version of the trace maximization problem possesses global optimal solutions which can be obtained by computing a partial eigendecomposition of the Gram matrix, and the cluster assignment for each data vectors can be found by computing a pivoted QR decomposition of the eigenvector matrix. As a by-product we also derive a lower bound for the minimum of the sum-of-squares cost function. 1

same-paper 2 0.91193902 88 nips-2001-Grouping and dimensionality reduction by locally linear embedding

Author: Marzia Polito, Pietro Perona

Abstract: Locally Linear Embedding (LLE) is an elegant nonlinear dimensionality-reduction technique recently introduced by Roweis and Saul [2]. It fails when the data is divided into separate groups. We study a variant of LLE that can simultaneously group the data and calculate local embedding of each group. An estimate for the upper bound on the intrinsic dimension of the data set is obtained automatically. 1

3 0.82757127 9 nips-2001-A Generalization of Principal Components Analysis to the Exponential Family

Author: Michael Collins, S. Dasgupta, Robert E. Schapire

Abstract: Principal component analysis (PCA) is a commonly applied technique for dimensionality reduction. PCA implicitly minimizes a squared loss function, which may be inappropriate for data that is not real-valued, such as binary-valued data. This paper draws on ideas from the Exponential family, Generalized linear models, and Bregman distances, to give a generalization of PCA to loss functions that we argue are better suited to other data types. We describe algorithms for minimizing the loss functions, and give examples on simulated data.

4 0.82326937 13 nips-2001-A Natural Policy Gradient

Author: Sham M. Kakade

Abstract: We provide a natural gradient method that represents the steepest descent direction based on the underlying structure of the parameter space. Although gradient methods cannot make large changes in the values of the parameters, we show that the natural gradient is moving toward choosing a greedy optimal action rather than just a better action. These greedy optimal actions are those that would be chosen under one improvement step of policy iteration with approximate, compatible value functions, as defined by Sutton et al. [9]. We then show drastic performance improvements in simple MDPs and in the more challenging MDP of Tetris. 1

5 0.81893057 127 nips-2001-Multi Dimensional ICA to Separate Correlated Sources

Author: Roland Vollgraf, Klaus Obermayer

Abstract: We present a new method for the blind separation of sources, which do not fulfill the independence assumption. In contrast to standard methods we consider groups of neighboring samples (

6 0.81597912 135 nips-2001-On Spectral Clustering: Analysis and an algorithm

7 0.8126967 114 nips-2001-Learning from Infinite Data in Finite Time

8 0.81200278 137 nips-2001-On the Convergence of Leveraging

9 0.81190515 157 nips-2001-Rates of Convergence of Performance Gradient Estimates Using Function Approximation and Bias in Reinforcement Learning

10 0.81145018 8 nips-2001-A General Greedy Approximation Algorithm with Applications

11 0.80835247 92 nips-2001-Incorporating Invariances in Non-Linear Support Vector Machines

12 0.80765021 44 nips-2001-Blind Source Separation via Multinode Sparse Representation

13 0.80740166 138 nips-2001-On the Generalization Ability of On-Line Learning Algorithms

14 0.8039214 29 nips-2001-Adaptive Sparseness Using Jeffreys Prior

15 0.80360466 197 nips-2001-Why Neuronal Dynamics Should Control Synaptic Learning Rules

16 0.80358565 121 nips-2001-Model-Free Least-Squares Policy Iteration

17 0.80229402 155 nips-2001-Quantizing Density Estimators

18 0.79922509 74 nips-2001-Face Recognition Using Kernel Methods

19 0.79853767 103 nips-2001-Kernel Feature Spaces and Nonlinear Blind Souce Separation

20 0.79845119 58 nips-2001-Covariance Kernels from Bayesian Generative Models