nips nips2005 nips2005-9 knowledge-graph by maker-knowledge-mining

9 nips-2005-A Domain Decomposition Method for Fast Manifold Learning

Source: pdf

Author: Zhenyue Zhang, Hongyuan Zha

Abstract: We propose a fast manifold learning algorithm based on the methodology of domain decomposition. Starting with the set of sample points partitioned into two subdomains, we develop the solution of the interface problem that can glue the embeddings on the two subdomains into an embedding on the whole domain. We provide a detailed analysis to assess the errors produced by the gluing process using matrix perturbation theory. Numerical examples are given to illustrate the efﬁciency and effectiveness of the proposed methods. 1

Reference: text

Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 cn Abstract We propose a fast manifold learning algorithm based on the methodology of domain decomposition. [sent-7, score-0.349]

2 Starting with the set of sample points partitioned into two subdomains, we develop the solution of the interface problem that can glue the embeddings on the two subdomains into an embedding on the whole domain. [sent-8, score-1.373]

3 We provide a detailed analysis to assess the errors produced by the gluing process using matrix perturbation theory. [sent-9, score-0.452]

4 1 Introduction The setting of manifold learning we consider is the following. [sent-11, score-0.174]

5 We are given a parameterized manifold of dimension d deﬁned by a mapping f : Ω → Rm , where d < m, and Ω open and connected in Rd . [sent-12, score-0.174]

6 We assume the manifold is well-behaved, it is smooth and contains no self-intersections etc. [sent-13, score-0.174]

7 The goal of manifold learning is to recover the parameters τ i ’s and/or the mapping f (·) from the sample points xi ’s [2, 6, 9, 12]. [sent-21, score-0.316]

8 The general framework of manifold learning methods involves imposing a connectivity structure such as a k-nearestneighbor graph on the set of sample points and then turn the embedding problem into the solution of an eigenvalue problem. [sent-22, score-0.465]

9 Usually constructing the graph dominates the computational cost of a manifold learning algorithm, but for large data sets, the computational cost of the eigenvalue problem can be substantial as well. [sent-23, score-0.262]

10 The focus of this paper is to explore the methodology of domain decomposition for developing fast algorithms for manifold learning. [sent-24, score-0.447]

11 Domain decomposition by now is a wellestablished ﬁeld in scientiﬁc computing and has been successfully applied in many science and engineering ﬁelds in connection with numerical solutions of partial differential equations. [sent-25, score-0.159]

12 One class of domain decomposition methods partitions the solution domain into subdomains, solves the problem on each subdomain and glue the partial solutions on the subdomains by solving an interface problem [7, 10]. [sent-26, score-1.284]

13 On each of the subdomain, we can use a manifold learning method such as LLE [6], LTSA [12] or any other manifold learning methods to construct an embedding for the subdomain in question. [sent-32, score-0.643]

14 We will then formulate the interface problem the solution of which will allow us to combine the embeddings on the two subdomains together to obtain an embedding over the whole domain. [sent-33, score-1.093]

15 In section 2, we give necessary and sufﬁcient conditions under which the embedding on the whole domain can be constructed from the embeddings on the subdomains. [sent-35, score-0.676]

16 In section 4, we analyze the errors produced by the gluing process using matrix perturbation theory. [sent-36, score-0.452]

17 In section 5, we brieﬂy mention how the partitioning of the set of sample points into subdomains can be accomplished by some graph partitioning algorithms. [sent-37, score-0.722]

18 We use e to denote a column vector of all 1’s the dimension of which should be clear from the context. [sent-40, score-0.075]

19 N (·) and R(·) denote the null space and range space of a matrix, respectively. [sent-41, score-0.098]

20 Assume that the whole sample domain X is divided into two subdomains X1 = {xi | i ∈ I1 } and X2 = {xi | i ∈ I2 }. [sent-51, score-0.786]

21 Suppose we have obtained the two low-dimensional embeddings T 1 and T2 of the sub-domains X1 and X2 , respectively. [sent-56, score-0.277]

22 The domain decomposition method attempts to recover the overall embedding T = {τ1 , . [sent-57, score-0.433]

23 , τN } from the embeddings T1 and T2 on the subdomains. [sent-60, score-0.277]

24 For example, it is often the case that the recovered embeddings Tj are approximately afﬁnely equal to {τi | i ∈ Ij }, i. [sent-62, score-0.327]

25 , up to certain approximation errors, there is an afﬁne transformation such that Tj = {Fj τi + cj | i ∈ Ij }, where Fj is a nonsingular matrix and cj a column vector. [sent-64, score-0.338]

26 Thus a domain decomposition method for manifold learning should be invariant to afﬁne transformation on the embeddings Tj obtained from subdomains. [sent-65, score-0.767]

27 With an abuse of notation, we also denote by T and Tj the matrices of the column vectors in the set T and Tj , for example, we write T = [τ1 , . [sent-69, score-0.075]

28 Then Tj can be recovered by computing the eigenvectors of Φj corresponding to its zero eigenvalues. [sent-74, score-0.079]

29 To recover the whole T we need to construct a matrix Φ with N (Φ) = span([e, T T ]) [11]. [sent-75, score-0.263]

30 To this end, for each Tj , let Φj = Qj QT ∈ RNj ×Nj , where Qj is an orthonormal basis j matrix of N ([e, TjT ]T ) and Nj is the column-size of Tj . [sent-76, score-0.173]

31 To construct a Φ matrix, Let Sj ∈ RN ×Nj be the 0-1 selection matrix deﬁned as Sj = IN (:, Ij ), where IN is the T ˆ ˆ ˆ identity matrix of order N . [sent-77, score-0.18]

32 The following theorem gives the necessary and sufﬁcient conditions under which the null space of Φ is just span{[e, T T ]}. [sent-81, score-0.121]

33 T The above result states that when the overlapping is large enough such that [e, T 0 ] is of full column-rank (which is generically true when T0 contains d + 1 points or more), the embedding over the whole domain can be recovered from the embeddings over the two subdomains. [sent-112, score-0.854]

34 1, it seems that we will need to compute the null space of Φ. [sent-114, score-0.066]

35 In the next section, we will show this can done much cheaply by considering an interface problem which is of much smaller dimension. [sent-115, score-0.066]

36 3 Computing the Null Space of Φ In this section, we formulate the interface problem and show how to solve it to glue the embeddings from the two subdomains to obtain an embedding over the whole domain. [sent-116, score-1.268]

37 To simplify notations, we re-denote by T ∗ the actual embedding over the whole domain and Tj∗ the subsets of T ∗ corresponding to subdomains. [sent-117, score-0.426]

38 Here cj is a constant column vector in Rd and Fj is a nonsingular matrix. [sent-121, score-0.138]

39 Denote by T0j the overlapping part of Tj corresponding to I0 = I1 ∩ I2 as in the proof of Theorem 2. [sent-122, score-0.093]

40 We ∗ consider the overlapping parts T0j of Tj∗ , ∗ ∗ c1 eT + F1 T01 = T01 = T02 = c2 eT + F2 T02 . [sent-124, score-0.093]

41 3) T T Therefore, if we take an orthonormal basis G of the null space of [e, T01 ], −[e, T02 ] T T and partition G = [GT , GT ]T conformally, then [e, T01 ]G1 = [e, T02 ]G2 . [sent-127, score-0.207]

42 Then since Φi AT = i 0, the well-deﬁned matrix AT is a basis of N (Φ), T T ΦAT = S1 Φ1 S1 AT + S2 Φ2 S2 AT = S1 Φ1 AT + S2 Φ2 AT = 0. [sent-130, score-0.11]

43 1 2 Therefore, we can use AT to recover the global embedding T . [sent-131, score-0.187]

44 , ﬁx one of T i and ˜ afﬁnely transform the other; the afﬁne matrix is obtained by ﬁxing one of T0i and transforming the other. [sent-134, score-0.103]

45 Then we can construct a larger matrix T by T (:, I1 ) = T1 , T (:, I2 ) = ceT + F T2 . [sent-138, score-0.105]

46 One can also readily verify that T T is a basis matrix of N (Φ). [sent-139, score-0.11]

47 For example, for the simultaneous afﬁne transformation, we take G = [GT , GT ]T to be an orthonormal matrix in 1 2 R2(d+1)×(d+1) such that T T [e, T01 ]G1 − [e, T02 ]G2 = min . [sent-141, score-0.146]

48 It is known that the minimum G is given by the right singular vector matrix correspondT T ing to the d + 1 smallest singular values of W = [e, T01 ], −[e, T02 ] , and the residual T T [e, T01 ]G1 − [e, T02 ]G2 = σd+2 (W ). [sent-142, score-0.274]

49 4), [c, F ] can be a solution to the least squares problem min T01 − ceT + F T02 c, F = min (T01 − t01 eT ) − F (T02 − t02 eT ) , F where t0j is the column mean of T0j . [sent-144, score-0.137]

50 Notice that the overlapping parts in the two afﬁnely transformed subsets are not exactly equal to each other in the noisy case. [sent-147, score-0.152]

51 We summarize discussions above in the following two algorithms for gluing the two subdomains T1 and T2 . [sent-151, score-0.748]

52 Compute the right singular vector matrix G corresponding to the d + 1 smallest T T singular values of [e, T01 ], −[e, T02 ] . [sent-154, score-0.248]

53 3 Compute the column mean a of A, and an orthogonal basis U of N (aT ). [sent-157, score-0.113]

54 Set the global coordinate matrix T by T (:, I1 \I0 ) = T11 , 4 ˆ T (:, I0 ) = αT01 + (1 − α)T02 , ˆ T (:, I2 \I0 ) = T12 . [sent-166, score-0.075]

55 Error Analysis As we mentioned before, the computation of Tj , j = 1, 2 using a manifold learning algorithm such as LTSA involves errors. [sent-167, score-0.174]

56 In this section, we assess the impact of those errors on the accuracy of the gluing process. [sent-168, score-0.307]

57 One is the perturbation analysis of N (Φ∗ ) when the computation of Φ∗ is subject to error. [sent-170, score-0.07]

58 The other issue is the error estimation of V when a basis matrix of V is approximately constructed by afﬁnely transformed local embeddings as described in section 3 (Theorem 4. [sent-173, score-0.419]

59 Clearly, if Φ∗ = w1 Φ∗ + w2 Φ∗ and 1 2 Φ = w1 Φ1 + w2 Φ2 , then dist span([e, (T ∗ )T ]), span([e, T T ]) = Φ − Φ∗ ≤ w1 1 + w2 2 ≡ . [sent-178, score-0.092]

60 1 Let σ be the smallest nonzero eigenvalue of Φ∗ and V the subspace spanned by the eigenvectors of Φ corresponding to the d + 1 smallest eigenvalues. [sent-180, score-0.159]

61 A is the matrix computed by the simultaneous afﬁne transformation (Algorithm I in section 3) Let σi (·) be the i-th smallest singular value of a matrix. [sent-185, score-0.257]

62 5 Partitioning the Domains To apply the domain decomposition methods, we need to partition the given set of data points into several domains making use of the k nearest neighbor graph imposed on the data points. [sent-190, score-0.439]

63 This reduces the problem to a graph partition problem and many techniques such as spectral graph partitioning and METIS [3, 5] can be used. [sent-191, score-0.238]

64 In our experiments, we have used a particularly simple approach: we use the reverse Cuthill-McKee method [4] to order the vertices of the k-NN graph and then partition the vertices into domains (for details see Test 2 in the next section). [sent-192, score-0.187]

65 Once we have partitioned the whole domain into multiple overlapping subdomains we can use the following two approaches to glue them together. [sent-193, score-1.068]

66 Here we glue the subdomains one by one as follows. [sent-195, score-0.674]

67 Initially set T (1) = T1 and I (1) = I1 , and then glue the patch Tk to T (k−1) and obtain the larger one T (k) for k = 2, . [sent-196, score-0.175]

68 (k) Clearly the overlapping set of T (k−1) and Tk is I0 = I (k−1) ∩ Ik . [sent-201, score-0.093]

69 Here at the leaf level, we divide the subdomains into several pairs, (0) (0) (1) say (T2i−1 , T2i ), 1 = 1, 2, . [sent-203, score-0.499]

70 Then glue each pair to be a larger subdomain Ti and continue. [sent-207, score-0.3]

71 6 Numerical Experiments In this section we report numerical experiments for the proposed domain decomposition methods for manifold learning. [sent-209, score-0.456]

72 This efﬁciency and effectiveness of the methods clearly depend on the accuracy of the computed embeddings for subdomains, the sizes of the subdomains, and the sizes of the overlaps of the subdomains. [sent-210, score-0.319]

73 Our ﬁrst test data set is sampled from a Swiss-roll as follows xi = [ti cos(ti ), hi , ti sin(ti )]T , i = 1, . [sent-212, score-0.139]

74 5) [ 3π , 9π ] 2 2 where ti and hi are uniformly randomly chosen in the intervals and [0, 21], respectively. [sent-216, score-0.107]

75 Let τi be the arc length of the corresponding spiral curve [t cos(t), t sin(t)]T from t0 = 3π to ti . [sent-217, score-0.107]

76 To compare the CPU time of the domain decompo2 sition methods, we simply partition the τ -interval [0, τmax ] into kτ subintervals (ai−1 , ai ] with equal length and also partition the h-interval into kh subintervals (bj−1 , bj ]. [sent-219, score-0.596]

77 Let Dij = (ai−1 , ai ] × (bj−1 , bj ] and Sij (r) be the balls centered at (ai , bj ) with radius r. [sent-220, score-0.178]

78 We set the subdomains as Xij = {xk | (τk , hk ) ∈ Dij ∪ Sij (r)}. [sent-221, score-0.499]

79 Clearly r determines the size of overlapping parts of Xij with Xi+1,j , Xi,j+1 , Xi+1,j+1 . [sent-222, score-0.093]

80 We ﬁrst compute the K local 2-D embeddings T1 , . [sent-233, score-0.277]

81 Then those local coordinate embeddings Tk are aligned by the successive one-sided afﬁne transformation algorithm by adding subdomain Tk one by one. [sent-237, score-0.582]

82 Table 1 lists the total CPU time for the successive domain decomposition algorithm, including the time for computing the embeddings {Tk } for the subdomains, for different parameters kτ and kh with the parameter r = 5. [sent-238, score-0.732]

83 In Table 2, we list the CPU time for the recursive gluing approach taking into account the parallel procedure. [sent-239, score-0.343]

84 As a comparison, the CPU time of LTSA applying to the whole data points is 6. [sent-240, score-0.146]

85 Table 1: CPU Time (seconds) of the successive domain decomposition algorithm. [sent-242, score-0.356]

86 94 Table 2: CPU Time (seconds) of the parallel recursive domain decomposition. [sent-281, score-0.242]

87 The symmetric reverse Cuthill-McKee permutation (symrcm) is an algorithm for ordering the rows and columns of a symmetric sparse matrix [4]. [sent-322, score-0.104]

88 It tends to move the nonzero elements of the sparse matrix towards the main diagonals of the matrix. [sent-323, score-0.075]

89 We use Matlab’s symrcm to the adjacency matrix of the k-nearest-neighbor graph of the data points to reorder them. [sent-324, score-0.22]

90 We then partition the whole sample points into K = 16 subsets Xi = X(:, si : ei ) with si = max{1, (i − 1)m − 20}, ei = min{im + 20, N }, and m = N/K = 125. [sent-326, score-0.327]

91 We have shown that within the errors made in computing the local embeddings, LTSA can recover the isometric parametrization up to an afﬁne transformation ˜ [11]. [sent-329, score-0.252]

92 We denote by T (k) = ceT + F T (k) the optimal approximation to T ∗ (:, I (k) ) within afﬁne transformations, ˜ T ∗ (:, I (k) ) − T (k) F = min T ∗ (:, I (k) ) − (ceT + F T (k) ) F . [sent-330, score-0.065]

93 c,F We denote by ηk the average of relative errors ηk = 1 |I (k) | i∈I (k) ˜ T ∗ (:, i) − T (k) (:, i) T ∗ (:, i) 2 2 . [sent-331, score-0.09]

94 In the left panel of Figure 1 we plot the initial embedding errors for the subdomains (blue bar), the error of LTSA applied to the whole data set (red bar), and the errors η k of the successive gluing (red line). [sent-332, score-1.225]

95 The successive gluing method gives an embedding with an acceptable accuracy comparing with the accuracy obtained by applying LTSA to the whole data set. [sent-333, score-0.61]

96 As shown in the error analysis, the errors in successive gluing will increase when the initial errors for the subdomains increase. [sent-334, score-0.974]

97 To show it more clearly, we also plot the η k for the recursive gluing method in the right panel of Figure 1. [sent-335, score-0.316]

98 5 successive alignment subdomains whole domain 4 x 10 root 4 root 3 root 2 root 1 subdomains whole domain 4 3. [sent-339, score-1.814]

99 5 0 0 2 4 6 8 10 k 12 14 16 18 0 0 2 4 6 8 10 12 14 16 18 k Figure 1: Relative errors for the successive (left) and recursive (right) approaches. [sent-347, score-0.235]

100 A property of eigenvectors of nonnegative symmetric matrices and its application to graph theory. [sent-361, score-0.089]

similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('subdomains', 0.499), ('embeddings', 0.277), ('gluing', 0.249), ('tj', 0.239), ('ltsa', 0.199), ('glue', 0.175), ('manifold', 0.174), ('af', 0.171), ('span', 0.154), ('cet', 0.15), ('domain', 0.148), ('embedding', 0.14), ('nely', 0.13), ('subdomain', 0.125), ('whole', 0.111), ('successive', 0.11), ('ti', 0.107), ('gt', 0.099), ('kh', 0.099), ('decomposition', 0.098), ('tk', 0.095), ('overlapping', 0.093), ('dist', 0.092), ('cpu', 0.088), ('matrix', 0.075), ('tit', 0.075), ('transformation', 0.07), ('perturbation', 0.07), ('partition', 0.068), ('recursive', 0.067), ('interface', 0.066), ('null', 0.066), ('bj', 0.065), ('singular', 0.061), ('graph', 0.06), ('zha', 0.059), ('errors', 0.058), ('qt', 0.057), ('theorem', 0.055), ('cj', 0.055), ('fj', 0.053), ('smallest', 0.051), ('partitioning', 0.05), ('recovered', 0.05), ('subintervals', 0.05), ('symrcm', 0.05), ('tjt', 0.05), ('xij', 0.05), ('ai', 0.048), ('aj', 0.048), ('root', 0.047), ('recover', 0.047), ('ij', 0.046), ('ik', 0.044), ('column', 0.043), ('partitioned', 0.042), ('et', 0.042), ('clearly', 0.042), ('sj', 0.042), ('dij', 0.04), ('nonsingular', 0.04), ('parametrization', 0.04), ('orthonormal', 0.038), ('isometric', 0.037), ('py', 0.037), ('numerical', 0.036), ('gi', 0.036), ('semide', 0.036), ('points', 0.035), ('nj', 0.035), ('xk', 0.035), ('orthogonal', 0.035), ('basis', 0.035), ('sij', 0.033), ('min', 0.033), ('transformed', 0.032), ('xi', 0.032), ('denote', 0.032), ('domains', 0.03), ('construct', 0.03), ('ei', 0.029), ('px', 0.029), ('reverse', 0.029), ('eigenvectors', 0.029), ('sample', 0.028), ('qj', 0.028), ('transform', 0.028), ('eigenvalue', 0.028), ('squares', 0.028), ('index', 0.028), ('cos', 0.027), ('methodology', 0.027), ('parallel', 0.027), ('subsets', 0.027), ('residual', 0.026), ('bar', 0.025), ('author', 0.025), ('overlap', 0.025), ('let', 0.025), ('partial', 0.025)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.99999976 9 nips-2005-A Domain Decomposition Method for Fast Manifold Learning

Author: Zhenyue Zhang, Hongyuan Zha

2 0.10926712 127 nips-2005-Mixture Modeling by Affinity Propagation

Author: Brendan J. Frey, Delbert Dueck

Abstract: Clustering is a fundamental problem in machine learning and has been approached in many ways. Two general and quite different approaches include iteratively ﬁtting a mixture model (e.g., using EM) and linking together pairs of training cases that have high afﬁnity (e.g., using spectral methods). Pair-wise clustering algorithms need not compute sufﬁcient statistics and avoid poor solutions by directly placing similar examples in the same cluster. However, many applications require that each cluster of data be accurately described by a prototype or model, so afﬁnity-based clustering – and its beneﬁts – cannot be directly realized. We describe a technique called “afﬁnity propagation”, which combines the advantages of both approaches. The method learns a mixture model of the data by recursively propagating afﬁnity messages. We demonstrate afﬁnity propagation on the problems of clustering image patches for image segmentation and learning mixtures of gene expression models from microarray data. We ﬁnd that afﬁnity propagation obtains better solutions than mixtures of Gaussians, the K-medoids algorithm, spectral clustering and hierarchical clustering, and is both able to ﬁnd a pre-speciﬁed number of clusters and is able to automatically determine the number of clusters. Interestingly, afﬁnity propagation can be viewed as belief propagation in a graphical model that accounts for pairwise training case likelihood functions and the identiﬁcation of cluster centers. 1

3 0.10644811 138 nips-2005-Non-Local Manifold Parzen Windows

Author: Yoshua Bengio, Hugo Larochelle, Pascal Vincent

Abstract: To escape from the curse of dimensionality, we claim that one can learn non-local functions, in the sense that the value and shape of the learned function at x must be inferred using examples that may be far from x. With this objective, we present a non-local non-parametric density estimator. It builds upon previously proposed Gaussian mixture models with regularized covariance matrices to take into account the local shape of the manifold. It also builds upon recent work on non-local estimators of the tangent plane of a manifold, which are able to generalize in places with little training data, unlike traditional, local, non-parametric models.

4 0.090435997 75 nips-2005-Fixing two weaknesses of the Spectral Method

Author: Kevin Lang

Abstract: We discuss two intrinsic weaknesses of the spectral graph partitioning method, both of which have practical consequences. The ﬁrst is that spectral embeddings tend to hide the best cuts from the commonly used hyperplane rounding method. Rather than cleaning up the resulting suboptimal cuts with local search, we recommend the adoption of ﬂow-based rounding. The second weakness is that for many “power law” graphs, the spectral method produces cuts that are highly unbalanced, thus decreasing the usefulness of the method for visualization (see ﬁgure 4(b)) or as a basis for divide-and-conquer algorithms. These balance problems, which occur even though the spectral method’s quotient-style objective function does encourage balance, can be ﬁxed with a stricter balance constraint that turns the spectral mathematical program into an SDP that can be solved for million-node graphs by a method of Burer and Monteiro. 1 Background Graph partitioning is the NP-hard problem of ﬁnding a small graph cut subject to the constraint that neither side of the resulting partitioning of the nodes is “too small”. We will be dealing with several versions: the graph bisection problem, which requires perfect 1 : 1 2 2 balance; the β-balanced cut problem (with β a fraction such as 1 ), which requires at least 3 β : (1 − β) balance; and the quotient cut problem, which requires the small side to be large enough to “pay for” the edges in the cut. The quotient cut metric is c/ min(a, b), where c is the cutsize and a and b are the sizes of the two sides of the cut. All of the well-known variants of the quotient cut metric (e.g. normalized cut [15]) have similar behavior with respect to the issues discussed in this paper. The spectral method for graph partitioning was introduced in 1973 by Fiedler and Donath & Hoffman [6]. In the mid-1980’s Alon & Milman [1] proved that spectral cuts can be at worst quadratically bad; in the mid 1990’s Guattery & Miller [10] proved that this analysis is tight by exhibiting a family of n-node graphs whose spectral bisections cut O(n 2/3 ) edges versus the optimal O(n1/3 ) edges. On the other hand, Spielman & Teng [16] have proved stronger performance guarantees for the special case of spacelike graphs. The spectral method can be derived by relaxing a quadratic integer program which encodes the graph bisection problem (see section 3.1). The solution to this relaxation is the “Fiedler vector”, or second smallest eigenvector of the graph’s discrete Laplacian matrix, whose elements xi can be interpreted as an embedding of the graph on the line. To obtain a (A) Graph with nearly balanced 8-cut (B) Spectral Embedding (C) Notional Flow-based Embedding Figure 1: The spectral embedding hides the best solution from hyperplane rounding. speciﬁc cut, one must apply a “rounding method” to this embedding. The hyperplane rounding method chooses one of the n − 1 cuts which separate the nodes whose x i values lie above and below some split value x. ˆ 2 Using ﬂow to ﬁnd cuts that are hidden from hyperplane rounding Theorists have long known that the spectral method cannot distinguish between deep cuts and long paths, and that this confusion can cause it to cut a graph in the wrong direction thereby producing the spectral method’s worst-case behavior [10]. In this section we will show by example that even when the spectral method is not fooled into cutting in the wrong direction, the resulting embedding can hide the best cuts from the hyperplane rounding method. This is a possible explanation for the frequently made empirical observation (see e.g. [12]) that hyperplane roundings of spectral embeddings are noisy and therefore beneﬁt from cleanup with a local search method such as Fiduccia-Matheyses [8]. Consider the graph in ﬁgure 1(a), which has a near-bisection cutting 8 edges. For this graph the spectral method produces the embedding shown in ﬁgure 1(b), and recommends that we make a vertical cut (across the horizontal dimension which is based on the Fiedler vector). This is correct in a generalized sense, but it is obvious that no hyperplane (or vertical line in this picture) can possibly extract the optimal 8-edge cut. Some insight into why spectral embeddings tend to have this problem can be obtained from the spectral method’s electrical interpretation. In this view the graph is represented by a resistor network [7]. Current ﬂowing in this network causes voltage drops across the resistors, thus determining the nodes’ voltages and hence their positions. When current ﬂows through a long series of resistors, it induces a progressive voltage drop. This is what causes the excessive length of the embeddings of the horizontal girder-like structures which are blocking all vertical hyperplane cuts in ﬁgure 1(b). If the embedding method were somehow not based on current, but rather on ﬂow, which does not distinguish between a pipe and a series of pipes, then the long girders could retract into the two sides of the embedding, as suggested by ﬁgure 1(c), and the best cut would be revealed. Because theoretical ﬂow-like embedding methods such as [14] are currently not practical, we point out that in cases like ﬁgure 1(b), where the spectral method has not chosen an incorrect direction for the cut, one can use an S-T max ﬂow problem with the ﬂow running in the recommended direction (horizontally for this embedding) to extract the good cut even though it is hidden from all hyperplanes. We currently use two different ﬂow-based rounding methods. A method called MQI looks for quotient cuts, and is already described in [13]. Another method, that we shall call Midﬂow, looks for β-balanced cuts. The input to Midﬂow is a graph and an ordering of its nodes (obtained e.g. from a spectral embedding or from the projection of any embedding onto a line). We divide the graph’s nodes into 3 sets F, L, and U. The sets F and L respectively contain the ﬁrst βn and last βn nodes in the ordering, and U contains the remaining 50-50 balance ng s ro un di Hy pe r pl an e neg-pos split quotient cut score (cutsize / size of small side) 0.01 ctor r ve iedle of F 0.004 0.003 0.00268 0.00232 Best hyperplane rounding of Fiedler Vector Best improvement with local search 0.002 0.00138 0.001 60000 80000 Midflow rounding beta = 1/4 100000 120000 0.00145 140000 Midflow rounding of Fiedler Vector beta = 1/3 160000 180000 200000 220000 240000 number of nodes on ’left’ side of cut (out of 324800) Figure 2: A typical example (see section 2.1) where ﬂow-based rounding beats hyperplane rounding, even when the hyperplane cuts are improved with Fiduccia-Matheyses search. Note that for this spacelike graph, the best quotient cuts have reasonably good balance. U = n − 2βn nodes, which are “up for grabs”. We set up an S-T max ﬂow problem with one node for every graph node plus 2 new nodes for the source and sink. For each graph edge there are two arcs, one in each direction, with unit capacity. Finally, the nodes in F are pinned to the source and the nodes in L are pinned to sink by inﬁnite capacity arcs. This max-ﬂow problem can be solved by a good implementation of the push-relabel algorithm (such as Goldberg and Cherkassky’s hi pr [4]) in time that empirically is nearly linear with a very good constant factor. Figure 6 shows that solving a MidFlow problem with hi pr can be 1000 times cheaper than ﬁnding a spectral embedding with ARPACK. When the goal is ﬁnding good β-balanced cuts, MidFlow rounding is strictly more powerful than hyperplane rounding; from a given node ordering hyperplane rounding chooses the best of U + 1 candidate cuts, while MidFlow rounding chooses the best of 2U candidates, including all of those considered by hyperplane rounding. [Similarly, MQI rounding is strictly more powerful than hyperplane rounding for the task of ﬁnding good quotient cuts.] 2.1 A concrete example The plot in ﬁgure 2 shows a number of cuts in a 324,800 node nearly planar graph derived from a 700x464 pixel downward-looking view of some clouds over some mountains.1 The y-axis of the plot is quotient cut score; smaller values are better. We note in passing that the commonly used split point x = 0 does not yield the best hyperplane cut. Our main ˆ point is that the two cuts generated by MidFlow rounding of the Fiedler vector (with β = 1 3 and β = 1 ) are nearly twice as good as the best hyperplane cut. Even after the best 4 hyperplane cut has been improved by taking the best result of 100 runs of a version of Fiduccia-Matheyses local search, it is still much worse than the cuts obtained by ﬂowbased rounding. 1 The graph’s edges are unweighted but are chosen by a randomized rule which is more likely to include an edge between two neighboring pixels if they have a similar grey value. Good cuts in the graph tend to run along discontinuities in the image, as one would expect. quotient cut score 1 SDP-LB (smaller is better) 0.1 Scatter plot showing cuts in a

5 0.086483166 20 nips-2005-Affine Structure From Sound

Author: Sebastian Thrun

Abstract: We consider the problem of localizing a set of microphones together with a set of external acoustic events (e.g., hand claps), emitted at unknown times and unknown locations. We propose a solution that approximates this problem under a far ﬁeld approximation deﬁned in the calculus of afﬁne geometry, and that relies on singular value decomposition (SVD) to recover the afﬁne structure of the problem. We then deﬁne low-dimensional optimization techniques for embedding the solution into Euclidean geometry, and further techniques for recovering the locations and emission times of the acoustic events. The approach is useful for the calibration of ad-hoc microphone arrays and sensor networks. 1

6 0.081985228 42 nips-2005-Combining Graph Laplacians for Semi--Supervised Learning

7 0.081689574 172 nips-2005-Selecting Landmark Points for Sparse Manifold Learning

8 0.072811365 27 nips-2005-Analysis of Spectral Kernel Design based Semi-supervised Learning

9 0.06955231 190 nips-2005-The Curse of Highly Variable Functions for Local Kernel Machines

10 0.065139607 56 nips-2005-Diffusion Maps, Spectral Clustering and Eigenfunctions of Fokker-Planck Operators

11 0.064178064 114 nips-2005-Learning Rankings via Convex Hull Separation

12 0.060489055 31 nips-2005-Asymptotics of Gaussian Regularized Least Squares

13 0.060455013 83 nips-2005-Generalization error bounds for classifiers trained with interdependent data

14 0.05793909 147 nips-2005-On the Convergence of Eigenspaces in Kernel Principal Component Analysis

15 0.056112055 71 nips-2005-Fast Krylov Methods for N-Body Learning

16 0.054428019 154 nips-2005-Preconditioner Approximations for Probabilistic Graphical Models

17 0.053795822 60 nips-2005-Dynamic Social Network Analysis using Latent Space Models

18 0.052920729 92 nips-2005-Hyperparameter and Kernel Learning for Graph Based Semi-Supervised Classification

19 0.052906837 126 nips-2005-Metric Learning by Collapsing Classes

20 0.052800652 178 nips-2005-Soft Clustering on Graphs

similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.173), (1, 0.081), (2, -0.051), (3, -0.058), (4, -0.111), (5, 0.007), (6, -0.005), (7, -0.041), (8, 0.047), (9, 0.006), (10, -0.001), (11, 0.012), (12, -0.011), (13, 0.005), (14, -0.021), (15, 0.096), (16, -0.075), (17, 0.119), (18, -0.033), (19, 0.048), (20, 0.03), (21, 0.042), (22, 0.045), (23, 0.161), (24, 0.065), (25, 0.033), (26, -0.005), (27, 0.016), (28, -0.035), (29, 0.193), (30, 0.029), (31, 0.122), (32, -0.017), (33, 0.096), (34, 0.033), (35, 0.035), (36, -0.023), (37, 0.091), (38, -0.033), (39, 0.011), (40, -0.161), (41, -0.035), (42, -0.062), (43, 0.095), (44, 0.005), (45, -0.178), (46, 0.09), (47, 0.06), (48, -0.008), (49, -0.165)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.94701713 9 nips-2005-A Domain Decomposition Method for Fast Manifold Learning

Author: Zhenyue Zhang, Hongyuan Zha

2 0.54250062 20 nips-2005-Affine Structure From Sound

Author: Sebastian Thrun

3 0.4917385 172 nips-2005-Selecting Landmark Points for Sparse Manifold Learning

Author: Jorge Silva, Jorge Marques, João Lemos

Abstract: There has been a surge of interest in learning non-linear manifold models to approximate high-dimensional data. Both for computational complexity reasons and for generalization capability, sparsity is a desired feature in such models. This usually means dimensionality reduction, which naturally implies estimating the intrinsic dimension, but it can also mean selecting a subset of the data to use as landmarks, which is especially important because many existing algorithms have quadratic complexity in the number of observations. This paper presents an algorithm for selecting landmarks, based on LASSO regression, which is well known to favor sparse approximations because it uses regularization with an l1 norm. As an added beneﬁt, a continuous manifold parameterization, based on the landmarks, is also found. Experimental results with synthetic and real data illustrate the algorithm. 1

4 0.48773602 138 nips-2005-Non-Local Manifold Parzen Windows

Author: Yoshua Bengio, Hugo Larochelle, Pascal Vincent

5 0.48637089 137 nips-2005-Non-Gaussian Component Analysis: a Semi-parametric Framework for Linear Dimension Reduction

Author: Gilles Blanchard, Masashi Sugiyama, Motoaki Kawanabe, Vladimir Spokoiny, Klaus-Robert Müller

Abstract: We propose a new linear method for dimension reduction to identify nonGaussian components in high dimensional data. Our method, NGCA (non-Gaussian component analysis), uses a very general semi-parametric framework. In contrast to existing projection methods we deﬁne what is uninteresting (Gaussian): by projecting out uninterestingness, we can estimate the relevant non-Gaussian subspace. We show that the estimation error of ﬁnding the non-Gaussian components tends to zero at a parametric rate. Once NGCA components are identiﬁed and extracted, various tasks can be applied in the data analysis process, like data visualization, clustering, denoising or classiﬁcation. A numerical study demonstrates the usefulness of our method. 1

6 0.47797278 127 nips-2005-Mixture Modeling by Affinity Propagation

7 0.44781047 66 nips-2005-Estimation of Intrinsic Dimensionality Using High-Rate Vector Quantization

8 0.41553494 107 nips-2005-Large scale networks fingerprinting and visualization using the k-core decomposition

9 0.40840423 62 nips-2005-Efficient Estimation of OOMs

10 0.38652343 75 nips-2005-Fixing two weaknesses of the Spectral Method

11 0.3816061 31 nips-2005-Asymptotics of Gaussian Regularized Least Squares

12 0.37598652 116 nips-2005-Learning Topology with the Generative Gaussian Graph and the EM Algorithm

13 0.36734781 46 nips-2005-Consensus Propagation

14 0.36385849 189 nips-2005-Tensor Subspace Analysis

15 0.35765499 114 nips-2005-Learning Rankings via Convex Hull Separation

16 0.3464376 86 nips-2005-Generalized Nonnegative Matrix Approximations with Bregman Divergences

17 0.34154832 190 nips-2005-The Curse of Highly Variable Functions for Local Kernel Machines

18 0.33847195 117 nips-2005-Learning from Data of Variable Quality

19 0.33504164 51 nips-2005-Correcting sample selection bias in maximum entropy density estimation

20 0.31340477 71 nips-2005-Fast Krylov Methods for N-Body Learning

similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(3, 0.101), (10, 0.057), (27, 0.036), (31, 0.038), (34, 0.094), (41, 0.011), (55, 0.035), (65, 0.032), (69, 0.074), (73, 0.076), (77, 0.022), (88, 0.047), (90, 0.248), (91, 0.033)]

similar papers list:

simIndex simValue paperId paperTitle

1 0.89583987 182 nips-2005-Statistical Convergence of Kernel CCA

Author: Kenji Fukumizu, Arthur Gretton, Francis R. Bach

Abstract: While kernel canonical correlation analysis (kernel CCA) has been applied in many problems, the asymptotic convergence of the functions estimated from a ﬁnite sample to the true functions has not yet been established. This paper gives a rigorous proof of the statistical convergence of kernel CCA and a related method (NOCCO), which provides a theoretical justiﬁcation for these methods. The result also gives a suﬃcient condition on the decay of the regularization coeﬃcient in the methods to ensure convergence. 1

2 0.7986598 168 nips-2005-Rodeo: Sparse Nonparametric Regression in High Dimensions

Author: Larry Wasserman, John D. Lafferty

Abstract: We present a method for nonparametric regression that performs bandwidth selection and variable selection simultaneously. The approach is based on the technique of incrementally decreasing the bandwidth in directions where the gradient of the estimator with respect to bandwidth is large. When the unknown function satisﬁes a sparsity condition, our approach avoids the curse of dimensionality, achieving the optimal minimax rate of convergence, up to logarithmic factors, as if the relevant variables were known in advance. The method—called rodeo (regularization of derivative expectation operator)—conducts a sequence of hypothesis tests, and is easy to implement. A modiﬁed version that replaces hard with soft thresholding effectively solves a sequence of lasso problems. 1

same-paper 3 0.79483062 9 nips-2005-A Domain Decomposition Method for Fast Manifold Learning

Author: Zhenyue Zhang, Hongyuan Zha

4 0.5897758 177 nips-2005-Size Regularized Cut for Data Clustering

Author: Yixin Chen, Ya Zhang, Xiang Ji

Abstract: We present a novel spectral clustering method that enables users to incorporate prior knowledge of the size of clusters into the clustering process. The cost function, which is named size regularized cut (SRcut), is deﬁned as the sum of the inter-cluster similarity and a regularization term measuring the relative size of two clusters. Finding a partition of the data set to minimize SRcut is proved to be NP-complete. An approximation algorithm is proposed to solve a relaxed version of the optimization problem as an eigenvalue problem. Evaluations over different data sets demonstrate that the method is not sensitive to outliers and performs better than normalized cut. 1

5 0.56662524 112 nips-2005-Learning Minimum Volume Sets

Author: Clayton Scott, Robert Nowak

Abstract: Given a probability measure P and a reference measure µ, one is often interested in the minimum µ-measure set with P -measure at least α. Minimum volume sets of this type summarize the regions of greatest probability mass of P , and are useful for detecting anomalies and constructing conﬁdence regions. This paper addresses the problem of estimating minimum volume sets based on independent samples distributed according to P . Other than these samples, no other information is available regarding P , but the reference measure µ is assumed to be known. We introduce rules for estimating minimum volume sets that parallel the empirical risk minimization and structural risk minimization principles in classiﬁcation. As in classiﬁcation, we show that the performances of our estimators are controlled by the rate of uniform convergence of empirical to true probabilities over the class from which the estimator is drawn. Thus we obtain ﬁnite sample size performance bounds in terms of VC dimension and related quantities. We also demonstrate strong universal consistency and an oracle inequality. Estimators based on histograms and dyadic partitions illustrate the proposed rules. 1

6 0.56618661 154 nips-2005-Preconditioner Approximations for Probabilistic Graphical Models

7 0.56285149 102 nips-2005-Kernelized Infomax Clustering

8 0.56275249 84 nips-2005-Generalization in Clustering with Unobserved Features

9 0.55893296 132 nips-2005-Nearest Neighbor Based Feature Selection for Regression and its Application to Neural Activity

10 0.55691695 190 nips-2005-The Curse of Highly Variable Functions for Local Kernel Machines

11 0.55641729 58 nips-2005-Divergences, surrogate loss functions and experimental design

12 0.55086321 199 nips-2005-Value Function Approximation with Diffusion Wavelets and Laplacian Eigenfunctions

13 0.54968733 56 nips-2005-Diffusion Maps, Spectral Clustering and Eigenfunctions of Fokker-Planck Operators

14 0.54647577 178 nips-2005-Soft Clustering on Graphs

15 0.54532874 160 nips-2005-Query by Committee Made Real

16 0.54506171 196 nips-2005-Two view learning: SVM-2K, Theory and Practice

17 0.54373306 49 nips-2005-Convergence and Consistency of Regularized Boosting Algorithms with Stationary B-Mixing Observations

18 0.54359418 43 nips-2005-Comparing the Effects of Different Weight Distributions on Finding Sparse Representations

19 0.5432151 198 nips-2005-Using ``epitomes'' to model genetic diversity: Rational design of HIV vaccine cocktails

20 0.54059565 172 nips-2005-Selecting Landmark Points for Sparse Manifold Learning