nips nips2006 nips2006-70 knowledge-graph by maker-knowledge-mining

70 nips-2006-Doubly Stochastic Normalization for Spectral Clustering


Source: pdf

Author: Ron Zass, Amnon Shashua

Abstract: In this paper we focus on the issue of normalization of the affinity matrix in spectral clustering. We show that the difference between N-cuts and Ratio-cuts is in the error measure being used (relative-entropy versus L1 norm) in finding the closest doubly-stochastic matrix to the input affinity matrix. We then develop a scheme for finding the optimal, under Frobenius norm, doubly-stochastic approximation using Von-Neumann’s successive projections lemma. The new normalization scheme is simple and efficient and provides superior clustering performance over many of the standardized tests. 1

Reference: text


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 Doubly Stochastic Normalization for Spectral Clustering Ron Zass and Amnon Shashua ∗ Abstract In this paper we focus on the issue of normalization of the affinity matrix in spectral clustering. [sent-1, score-0.576]

2 We show that the difference between N-cuts and Ratio-cuts is in the error measure being used (relative-entropy versus L1 norm) in finding the closest doubly-stochastic matrix to the input affinity matrix. [sent-2, score-0.218]

3 We then develop a scheme for finding the optimal, under Frobenius norm, doubly-stochastic approximation using Von-Neumann’s successive projections lemma. [sent-3, score-0.197]

4 The new normalization scheme is simple and efficient and provides superior clustering performance over many of the standardized tests. [sent-4, score-0.72]

5 1 Introduction The problem of partitioning data points into a number of distinct sets, known as the clustering problem, is central in data analysis and machine learning. [sent-5, score-0.231]

6 In this domain there are three principle dimensions which make a successful clustering: (i) the affinity measure, (ii) the normalization of the affinity matrix, and (iii) the particular clustering algorithm. [sent-7, score-0.575]

7 Common practice indicates that the former two are largely responsible for the performance whereas the particulars of the clustering process itself have a relatively smaller impact on the performance. [sent-8, score-0.209]

8 In this paper we focus on the normalization of the affinity matrix. [sent-9, score-0.397]

9 [1]) and Normalized-cut [7] employ an implicit normalization which corresponds to L1 and Relative Entropy based approximations of the affinity matrix K to a doubly stochastic matrix. [sent-11, score-0.891]

10 We demonstrate the impact of the various normalization schemes on a large variety of data sets and show that the new normalization algorithm often induces a significant performance boost in standardized tests. [sent-13, score-0.888]

11 Taken together, we introduce a new tuning dimension to clustering algorithms allowing better control of the clustering performance. [sent-14, score-0.356]

12 Let Kij = κ(xi , xj ) be a symmetric positive-semi-definite ∗ School of Engineering and Computer Science, Hebrew University of Jerusalem, Jerusalem 91904, Israel. [sent-23, score-0.087]

13 ,ψk j=1 1 nj Kr,s , (1) (r,s)∈ψj is equivalent to minimizing the ”kernel K-means” problem: k min c1 ,. [sent-30, score-0.043]

14 ,ψk φ(xi ) − cj 2 , j=1 i∈ψj where φ(xi ) is a mapping associated with the kernel κ(xi , xj ) = φ(xi ) φ(xj ) and cj = (1/nj ) i∈ψj φ(xi ) are the class centers. [sent-36, score-0.141]

15 1 is equivalent to the matrix form: max tr(G KG) s. [sent-38, score-0.057]

16 t G ≥ 0, GG 1 = 1, G G = I G (2) √ where G is the desired assignment matrix with Gij = 1/ nj if i ∈ ψj and zero otherwise, and 1 is a column vector of ones. [sent-39, score-0.1]

17 Note that the feasible set of matrices satisfying the constraints G ≥ 0, GG 1 = 1, G G = I are of this form for some partitioning ψ1 , . [sent-40, score-0.053]

18 Note also that the matrix F = GG must be doubly stochastic (F is non-negative, symmetric and F 1 = 1). [sent-44, score-0.541]

19 To see the connection with spectral clustering, and N-cuts in particular, relax the non-negativity condition of eqn. [sent-46, score-0.122]

20 2 and define a two-stage approach: find the closest doubly stochastic matrix K to K and we are left with a spectral decomposition problem: max tr(G K G) s. [sent-47, score-0.704]

21 t G G = I G (3) where G contains the leading k eigenvectors of K . [sent-48, score-0.079]

22 We will refer to the process of transforming K to K as a normalization step. [sent-49, score-0.397]

23 In N-cuts, the normalization takes the form K = D−1/2 KD−1/2 where D = diag(K1) (a diagonal matrix containing the row sums of K) [9]. [sent-50, score-0.454]

24 In [11] it was shown that repeating the N-cuts normalization, i. [sent-51, score-0.042]

25 , setting up the iterative step K (t+1) = D−1/2 K (t) D−1/2 where D = diag(K (t) 1) and K (0) = K converges to a doubly-stochastic matrix (a symmetric version of the well known ”iterative proportional fitting procedure” [8]). [sent-53, score-0.159]

26 The conclusion of this brief background is to highlight the motivation for seeking a doubly-stochastic approximation to the input affinity matrix as part of the clustering process. [sent-54, score-0.263]

27 The open issue is under what error measure is the approximation to take place? [sent-55, score-0.101]

28 It is not difficult to show that repeating the N-cuts normalization converges to the global optimum under the relative entropy measure (see Appendix). [sent-56, score-0.631]

29 Noting that spectral clustering optimizes the Frobenius norm it seems less natural to have the normalization step optimize a relative entropy error measure. [sent-57, score-0.987]

30 We will derive in this paper the normalization under the L1 norm and under the Frobenius norm. [sent-58, score-0.523]

31 The purpose of the L1 norm is to show that the resulting scheme is equivalent to a ratio-cut clustering — thereby not introducing a new clustering scheme but only contributing to the unification and better understanding the differences between the N-cuts and Ratio-cuts schemes. [sent-59, score-0.654]

32 The Frobenius norm normalization is a new formulation and is based on a simple iterative scheme. [sent-60, score-0.578]

33 The resulting normalization provides a new clustering performance which proves quite practical and boosts the clustering performance in many of the standardized tests we conducted. [sent-61, score-0.812]

34 , the partitioning of the data set into two clusters is determined by the second smallest eigenvector of the Laplacian D − K, where D = diag(K1). [sent-64, score-0.137]

35 Since K − F 1 ≥ (K − F )1 1 for any matrix F , we must have: r ≥ (K − F )1 1 = D1 − 1 1 = D−I ij abs(Aij ) is the L1 1. [sent-68, score-0.103]

36 4 Normalizing under Frobenius Norm Given that spectral clustering optimizes the Frobenius norm, there is a strong argument in favor of finding a Frobenius-norm optimum doubly stochastic approximation to K. [sent-72, score-0.828]

37 However, the special circumstances of our problem render the solution to the QLP to consist of a very simple iterative computation, as described next. [sent-74, score-0.055]

38 The closest doubly-stochastic matrix K under Frobenius norm is the solution to the following QLP: K = argminF K − F 2 F s. [sent-75, score-0.271]

39 F ≥ 0, F 1 = 1, F = F , (4) 2 where A 2 = F ij Aij is the Frobenius norm. [sent-77, score-0.046]

40 F ≥ 0 (6) We will use the Von-Neumann [5] successive projection lemma stating that P1 P2 P1 P2 . [sent-83, score-0.124]

41 P1 (K) will converge onto the projection of K onto the intersection of the affine and conic subspaces described above1 . [sent-86, score-0.091]

42 Therefore, what remains to show is that the projections P1 and P2 can be solved efficiently (and in closed form). [sent-87, score-0.031]

43 1 actually, the Von-Neumann lemma applies only to linear subspaces. [sent-92, score-0.036]

44 The extension to convex subspaces involves a ”deflection” component described by Dykstra [3]. [sent-93, score-0.029]

45 However, it is possible to show that for this specific problem the deflection component is redundant and the Von-Neumann lemma still applies. [sent-94, score-0.036]

46 4000 4 Projection Matlab QP 2000 1000 0 L1 Frobenius Relative Entropy 3 seconds seconds 3000 2 1 10 20 30 # of data−points 40 50 0 500 1000 1500 # of data−points (a) 2000 (b) Figure 1: Running times of the normalization algorithms. [sent-95, score-0.397]

47 (a) the Frobenius scheme compared to a general Matlab QLP solver, (b) running time of the three normalization schemes. [sent-96, score-0.509]

48 n (7) The projection P2 (X) can also be described in a simple closed form manner. [sent-99, score-0.036]

49 1a shows the running time of the algorithm compared to an off-the-shelf QLP Matlab solver over random matrices of increasing size — one can see that the run-time of our algorithm is a fraction of the standard QLP solver and scales very well with dimension. [sent-117, score-0.144]

50 In fact the standard QLP solver can handle only small problem sizes. [sent-118, score-0.059]

51 1b we plot the running times of all three normalization schemes: the L1 norm (computing the Laplacian), the relative-entropy (the iterative D−1/2 KD−1/2 ), and the Frobenius scheme presented in this section. [sent-120, score-0.69]

52 The Frobenius is more efficient than the relative-entropy normalization (which is the least efficient among the three). [sent-121, score-0.397]

53 5 Experiments For the clustering algorithm into k ≥ 2 clusters we experimented with the spectral algorithms described in [10] and [6]. [sent-122, score-0.3]

54 The latter uses the N-cuts normalization D−1/2 KD−1/2 followed by K-means on the embedded coordinates (the leading k eigenvectors of the normalized affinity) and the former uses a certain discretization scheme to turn the k leading eigenvectors into an indicator matrix. [sent-123, score-0.699]

55 Both algorithms produced similar results thus we focused on [10] while replacing the normalization with the three schemes presented above. [sent-124, score-0.432]

56 We also included a ”None” field which corresponds to no normalization being applied. [sent-126, score-0.397]

57 8 Table 1: UCI datasets used, together with some characteristics and the best result achieved using the different methods. [sent-158, score-0.075]

58 6 Table 2: Cancer datasets used, together with some characteristics and the best result achieved using the different methods. [sent-179, score-0.075]

59 We begin with evaluating the clustering quality obtained under the different normalization methods taken over a number of well studied datasets from the UCI repository2 . [sent-180, score-0.647]

60 The data-sets are listed in Table 1 together with some of their characteristics. [sent-181, score-0.079]

61 The best performance (lowest error rate) xi −xj 2 is presented in Boldface. [sent-182, score-0.067]

62 With the first four datasets we used an RBF kernel e σ2 for the affinity matrix, while for the latter two a polynomial kernel (xT xj + 1)d was used. [sent-183, score-0.175]

63 The kernel i parameters were calibrated independently for each method and for each dataset. [sent-184, score-0.043]

64 In most cases the best performance was obtained with the Frobenius norm approximation, but as a general rule the type of normalization depends on the data. [sent-185, score-0.523]

65 Also worth noting are instances, such as Wine and SpamBase, when the RE or Ncuts actually worsen the performance. [sent-186, score-0.029]

66 In that case the RE performance is worse the Ncuts as the entire normalization direction is counter-productive. [sent-187, score-0.397]

67 When RE outperforms None it also outperforms Ncuts (as can be expected since Ncuts is the first step in the iterative scheme of RE). [sent-188, score-0.141]

68 2 the clustering performance of each dataset under each normalization scheme under varying kernel setting (σ and d values). [sent-190, score-0.747]

69 Generally, the performance of the Frobenius normalization behaves in a smoother manner and is more stable under varying kernel settings than the other normalization schemes. [sent-191, score-0.837]

70 Our next set of experiments was over some well studied cancer data-sets3 . [sent-192, score-0.096]

71 The data-sets are listed in Table 2 together with some of their characteristics. [sent-193, score-0.079]

72 The column ”#PC” refers to the number of principal components used in a PCA pre-processing for the purpose of dimensionality reduction prior to clustering. [sent-194, score-0.023]

73 Note that better results can be achieved when using a more sophisticated preprocessing, but since the focus is on the performances of the clustering algorithms and not on the datasets, we prefer not to use the optimal pre-processing and leave the data noisy. [sent-195, score-0.178]

74 similarity measure, for the cancer datasets listed in Table 2. [sent-206, score-0.198]

75 L1 in magenta +; Forbenius in blue o; Relative Entropy in black ×; and Normalized-Cuts in red Leukemia dataset is a challenging benchmark common in the cancer community, where the task is to distinguish between two types of Leukemia. [sent-207, score-0.176]

76 The original dataset consists of 7129 coordinates probed from 6817 human genes, and we perform PCA to obtain 5 leading principal components prior to clustering using a polynomial kernel. [sent-208, score-0.303]

77 Lung Cancer (Brigham and Women’s Hospital, Harvard Medical School) dataset is another common benchmark that describes 12533 genes sampled from 181 tissues. [sent-209, score-0.091]

78 The Prostate dataset consists of 12,600 coordinates representing different genes, where the task is to identify prostate samples as tumor or non-tumor. [sent-211, score-0.244]

79 We use the first five principal components as input for clustering using an RBF kernel. [sent-212, score-0.201]

80 The Prostate Outcome dataset uses the same genes from another set of prostate samples, where the task is to predict the clinical outcome (relapse or non-relapse for at least four years). [sent-213, score-0.31]

81 3 shows the clustering performance of each dataset under each normalization scheme under varying kernel settings (σ and d values). [sent-215, score-0.747]

82 6 Summary Normalization of the affinity matrix is a crucial element in the success of spectral clustering. [sent-216, score-0.179]

83 The type of normalization performed by N-cuts is a step towards a doubly-stochastic approximation of the affinity matrix under relative entropy [11]. [sent-217, score-0.592]

84 The scheme involves a succession of simple computations, is very simple to implement and is efficient computation-wise, and (iii) throughout extensive experimentation on standard data-sets we have shown the importance of normalization to the performance of spectral clustering. [sent-219, score-0.605]

85 In the experiments we have conducted the Frobenius normalization had the upper-hand in most cases. [sent-220, score-0.397]

86 We have also shown that the relative-entropy normalization is not always the right approach as in some data-sets the performance worsened after the relative-entropy but never worsened when the Frobenius normalization was applied. [sent-221, score-0.894]

87 F ≥ 0, F = F , F 1 = 1, F 1 = 1 F has the form F = DKD for some (unique) diagonal matrix D. [sent-298, score-0.057]

88 Proof: The Lagrangian of the problem is: fij L() = fij ln + kij − fij − kij ij ij ij fij − 1) − λi ( i j The derivative with respect to fij is: ∂L = ln fij + 1 − ln kij − 1 − λi − µj = 0 ∂fij from which we obtain: fij = eλi eµj kij Let D1 = diag(eλ1 , . [sent-299, score-2.522]

89 , eµn ), then we have: F = D1 KD2 Since F = F and K is symmetric we must have D1 = D2 . [sent-305, score-0.047]


similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('normalization', 0.397), ('doubly', 0.347), ('frobenius', 0.329), ('fij', 0.258), ('nity', 0.247), ('af', 0.218), ('qlp', 0.2), ('clustering', 0.178), ('prostate', 0.174), ('ncuts', 0.152), ('norm', 0.126), ('spectral', 0.122), ('kij', 0.122), ('sigma', 0.111), ('gg', 0.1), ('cancer', 0.096), ('stochastic', 0.09), ('closest', 0.088), ('scheme', 0.086), ('poly', 0.079), ('rbf', 0.079), ('entropy', 0.076), ('spambase', 0.075), ('kd', 0.074), ('lung', 0.065), ('argminf', 0.065), ('laplacian', 0.063), ('standardized', 0.059), ('leukemia', 0.059), ('eigenvector', 0.059), ('solver', 0.059), ('matrix', 0.057), ('xij', 0.055), ('iterative', 0.055), ('partitioning', 0.053), ('listed', 0.053), ('errors', 0.053), ('successive', 0.052), ('bupa', 0.05), ('forbenius', 0.05), ('pima', 0.05), ('spectf', 0.05), ('worsened', 0.05), ('wine', 0.05), ('datasets', 0.049), ('genes', 0.048), ('symmetric', 0.047), ('diag', 0.047), ('eigenvectors', 0.047), ('ij', 0.046), ('outcome', 0.045), ('wdbc', 0.043), ('zass', 0.043), ('measure', 0.043), ('kernel', 0.043), ('dataset', 0.043), ('nj', 0.043), ('repeating', 0.042), ('none', 0.04), ('ection', 0.04), ('xj', 0.04), ('optimum', 0.039), ('magenta', 0.037), ('xi', 0.037), ('projection', 0.036), ('lemma', 0.036), ('uci', 0.036), ('schemes', 0.035), ('degree', 0.034), ('relative', 0.034), ('proposition', 0.033), ('leading', 0.032), ('projections', 0.031), ('former', 0.031), ('lowest', 0.03), ('error', 0.03), ('ln', 0.03), ('re', 0.029), ('subspaces', 0.029), ('desire', 0.029), ('cj', 0.029), ('jerusalem', 0.029), ('noting', 0.029), ('th', 0.028), ('matlab', 0.028), ('approximation', 0.028), ('coordinates', 0.027), ('aij', 0.027), ('together', 0.026), ('lagrangian', 0.026), ('intersection', 0.026), ('running', 0.026), ('smallest', 0.025), ('cuts', 0.025), ('optimizes', 0.024), ('table', 0.024), ('nding', 0.024), ('pc', 0.024), ('principal', 0.023), ('begin', 0.023)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.99999994 70 nips-2006-Doubly Stochastic Normalization for Spectral Clustering

Author: Ron Zass, Amnon Shashua

Abstract: In this paper we focus on the issue of normalization of the affinity matrix in spectral clustering. We show that the difference between N-cuts and Ratio-cuts is in the error measure being used (relative-entropy versus L1 norm) in finding the closest doubly-stochastic matrix to the input affinity matrix. We then develop a scheme for finding the optimal, under Frobenius norm, doubly-stochastic approximation using Von-Neumann’s successive projections lemma. The new normalization scheme is simple and efficient and provides superior clustering performance over many of the standardized tests. 1

2 0.1767651 83 nips-2006-Generalized Maximum Margin Clustering and Unsupervised Kernel Learning

Author: Hamed Valizadegan, Rong Jin

Abstract: Maximum margin clustering was proposed lately and has shown promising performance in recent studies [1, 2]. It extends the theory of support vector machine to unsupervised learning. Despite its good performance, there are three major problems with maximum margin clustering that question its efficiency for real-world applications. First, it is computationally expensive and difficult to scale to large-scale datasets because the number of parameters in maximum margin clustering is quadratic in the number of examples. Second, it requires data preprocessing to ensure that any clustering boundary will pass through the origins, which makes it unsuitable for clustering unbalanced dataset. Third, it is sensitive to the choice of kernel functions, and requires external procedure to determine the appropriate values for the parameters of kernel functions. In this paper, we propose “generalized maximum margin clustering” framework that addresses the above three problems simultaneously. The new framework generalizes the maximum margin clustering algorithm by allowing any clustering boundaries including those not passing through the origins. It significantly improves the computational efficiency by reducing the number of parameters. Furthermore, the new framework is able to automatically determine the appropriate kernel matrix without any labeled data. Finally, we show a formal connection between maximum margin clustering and spectral clustering. We demonstrate the efficiency of the generalized maximum margin clustering algorithm using both synthetic datasets and real datasets from the UCI repository. 1

3 0.15760008 7 nips-2006-A Local Learning Approach for Clustering

Author: Mingrui Wu, Bernhard Schölkopf

Abstract: We present a local learning approach for clustering. The basic idea is that a good clustering result should have the property that the cluster label of each data point can be well predicted based on its neighboring data and their cluster labels, using current supervised learning methods. An optimization problem is formulated such that its solution has the above property. Relaxation and eigen-decomposition are applied to solve this optimization problem. We also briefly investigate the parameter selection issue and provide a simple parameter selection method for the proposed algorithm. Experimental results are provided to validate the effectiveness of the proposed approach. 1

4 0.15037647 80 nips-2006-Fundamental Limitations of Spectral Clustering

Author: Boaz Nadler, Meirav Galun

Abstract: Spectral clustering methods are common graph-based approaches to clustering of data. Spectral clustering algorithms typically start from local information encoded in a weighted graph on the data and cluster according to the global eigenvectors of the corresponding (normalized) similarity matrix. One contribution of this paper is to present fundamental limitations of this general local to global approach. We show that based only on local information, the normalized cut functional is not a suitable measure for the quality of clustering. Further, even with a suitable similarity measure, we show that the first few eigenvectors of such adjacency matrices cannot successfully cluster datasets that contain structures at different scales of size and density. Based on these findings, a second contribution of this paper is a novel diffusion based measure to evaluate the coherence of individual clusters. Our measure can be used in conjunction with any bottom-up graph-based clustering method, it is scale-free and can determine coherent clusters at all scales. We present both synthetic examples and real image segmentation problems where various spectral clustering algorithms fail. In contrast, using this coherence measure finds the expected clusters at all scales. Keywords: Clustering, kernels, learning theory. 1

5 0.14998892 39 nips-2006-Balanced Graph Matching

Author: Timothee Cour, Praveen Srinivasan, Jianbo Shi

Abstract: Graph matching is a fundamental problem in Computer Vision and Machine Learning. We present two contributions. First, we give a new spectral relaxation technique for approximate solutions to matching problems, that naturally incorporates one-to-one or one-to-many constraints within the relaxation scheme. The second is a normalization procedure for existing graph matching scoring functions that can dramatically improve the matching accuracy. It is based on a reinterpretation of the graph matching compatibility matrix as a bipartite graph on edges for which we seek a bistochastic normalization. We evaluate our two contributions on a comprehensive test set of random graph matching problems, as well as on image correspondence problem. Our normalization procedure can be used to improve the performance of many existing graph matching algorithms, including spectral matching, graduated assignment and semidefinite programming. 1

6 0.095564157 117 nips-2006-Learning on Graph with Laplacian Regularization

7 0.090417154 67 nips-2006-Differential Entropic Clustering of Multivariate Gaussians

8 0.083327971 102 nips-2006-Kernel Maximum Entropy Data Transformation and an Enhanced Spectral Clustering Algorithm

9 0.082629539 86 nips-2006-Graph-Based Visual Saliency

10 0.080836356 65 nips-2006-Denoising and Dimension Reduction in Feature Space

11 0.076330893 151 nips-2006-On the Relation Between Low Density Separation, Spectral Clustering and Graph Cuts

12 0.06933362 79 nips-2006-Fast Iterative Kernel PCA

13 0.062432207 35 nips-2006-Approximate inference using planar graph decomposition

14 0.061662011 100 nips-2006-Information Bottleneck for Non Co-Occurrence Data

15 0.061640173 81 nips-2006-Game Theoretic Algorithms for Protein-DNA binding

16 0.061064206 186 nips-2006-Support Vector Machines on a Budget

17 0.060404006 200 nips-2006-Unsupervised Regression with Applications to Nonlinear System Identification

18 0.057183273 149 nips-2006-Nonnegative Sparse PCA

19 0.056359977 62 nips-2006-Correcting Sample Selection Bias by Unlabeled Data

20 0.053950686 158 nips-2006-PG-means: learning the number of clusters in data


similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, -0.187), (1, 0.094), (2, 0.014), (3, 0.216), (4, 0.068), (5, -0.089), (6, 0.04), (7, 0.102), (8, 0.093), (9, -0.003), (10, 0.023), (11, 0.082), (12, -0.026), (13, -0.044), (14, -0.029), (15, 0.02), (16, -0.016), (17, 0.018), (18, 0.073), (19, -0.005), (20, 0.145), (21, -0.007), (22, -0.082), (23, 0.076), (24, -0.025), (25, 0.011), (26, -0.005), (27, -0.016), (28, 0.098), (29, 0.023), (30, -0.035), (31, -0.068), (32, 0.118), (33, 0.014), (34, 0.007), (35, 0.114), (36, -0.046), (37, 0.009), (38, 0.008), (39, 0.112), (40, 0.031), (41, 0.011), (42, -0.075), (43, 0.091), (44, 0.02), (45, -0.011), (46, -0.078), (47, 0.055), (48, 0.058), (49, 0.004)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.96676499 70 nips-2006-Doubly Stochastic Normalization for Spectral Clustering

Author: Ron Zass, Amnon Shashua

Abstract: In this paper we focus on the issue of normalization of the affinity matrix in spectral clustering. We show that the difference between N-cuts and Ratio-cuts is in the error measure being used (relative-entropy versus L1 norm) in finding the closest doubly-stochastic matrix to the input affinity matrix. We then develop a scheme for finding the optimal, under Frobenius norm, doubly-stochastic approximation using Von-Neumann’s successive projections lemma. The new normalization scheme is simple and efficient and provides superior clustering performance over many of the standardized tests. 1

2 0.72680712 7 nips-2006-A Local Learning Approach for Clustering

Author: Mingrui Wu, Bernhard Schölkopf

Abstract: We present a local learning approach for clustering. The basic idea is that a good clustering result should have the property that the cluster label of each data point can be well predicted based on its neighboring data and their cluster labels, using current supervised learning methods. An optimization problem is formulated such that its solution has the above property. Relaxation and eigen-decomposition are applied to solve this optimization problem. We also briefly investigate the parameter selection issue and provide a simple parameter selection method for the proposed algorithm. Experimental results are provided to validate the effectiveness of the proposed approach. 1

3 0.68351471 80 nips-2006-Fundamental Limitations of Spectral Clustering

Author: Boaz Nadler, Meirav Galun

Abstract: Spectral clustering methods are common graph-based approaches to clustering of data. Spectral clustering algorithms typically start from local information encoded in a weighted graph on the data and cluster according to the global eigenvectors of the corresponding (normalized) similarity matrix. One contribution of this paper is to present fundamental limitations of this general local to global approach. We show that based only on local information, the normalized cut functional is not a suitable measure for the quality of clustering. Further, even with a suitable similarity measure, we show that the first few eigenvectors of such adjacency matrices cannot successfully cluster datasets that contain structures at different scales of size and density. Based on these findings, a second contribution of this paper is a novel diffusion based measure to evaluate the coherence of individual clusters. Our measure can be used in conjunction with any bottom-up graph-based clustering method, it is scale-free and can determine coherent clusters at all scales. We present both synthetic examples and real image segmentation problems where various spectral clustering algorithms fail. In contrast, using this coherence measure finds the expected clusters at all scales. Keywords: Clustering, kernels, learning theory. 1

4 0.63642991 83 nips-2006-Generalized Maximum Margin Clustering and Unsupervised Kernel Learning

Author: Hamed Valizadegan, Rong Jin

Abstract: Maximum margin clustering was proposed lately and has shown promising performance in recent studies [1, 2]. It extends the theory of support vector machine to unsupervised learning. Despite its good performance, there are three major problems with maximum margin clustering that question its efficiency for real-world applications. First, it is computationally expensive and difficult to scale to large-scale datasets because the number of parameters in maximum margin clustering is quadratic in the number of examples. Second, it requires data preprocessing to ensure that any clustering boundary will pass through the origins, which makes it unsuitable for clustering unbalanced dataset. Third, it is sensitive to the choice of kernel functions, and requires external procedure to determine the appropriate values for the parameters of kernel functions. In this paper, we propose “generalized maximum margin clustering” framework that addresses the above three problems simultaneously. The new framework generalizes the maximum margin clustering algorithm by allowing any clustering boundaries including those not passing through the origins. It significantly improves the computational efficiency by reducing the number of parameters. Furthermore, the new framework is able to automatically determine the appropriate kernel matrix without any labeled data. Finally, we show a formal connection between maximum margin clustering and spectral clustering. We demonstrate the efficiency of the generalized maximum margin clustering algorithm using both synthetic datasets and real datasets from the UCI repository. 1

5 0.57352358 39 nips-2006-Balanced Graph Matching

Author: Timothee Cour, Praveen Srinivasan, Jianbo Shi

Abstract: Graph matching is a fundamental problem in Computer Vision and Machine Learning. We present two contributions. First, we give a new spectral relaxation technique for approximate solutions to matching problems, that naturally incorporates one-to-one or one-to-many constraints within the relaxation scheme. The second is a normalization procedure for existing graph matching scoring functions that can dramatically improve the matching accuracy. It is based on a reinterpretation of the graph matching compatibility matrix as a bipartite graph on edges for which we seek a bistochastic normalization. We evaluate our two contributions on a comprehensive test set of random graph matching problems, as well as on image correspondence problem. Our normalization procedure can be used to improve the performance of many existing graph matching algorithms, including spectral matching, graduated assignment and semidefinite programming. 1

6 0.54316896 67 nips-2006-Differential Entropic Clustering of Multivariate Gaussians

7 0.53875995 123 nips-2006-Learning with Hypergraphs: Clustering, Classification, and Embedding

8 0.45899069 117 nips-2006-Learning on Graph with Laplacian Regularization

9 0.44623709 87 nips-2006-Graph Laplacian Regularization for Large-Scale Semidefinite Programming

10 0.40782952 102 nips-2006-Kernel Maximum Entropy Data Transformation and an Enhanced Spectral Clustering Algorithm

11 0.39873597 100 nips-2006-Information Bottleneck for Non Co-Occurrence Data

12 0.37226576 93 nips-2006-Hyperparameter Learning for Graph Based Semi-supervised Learning Algorithms

13 0.37080452 79 nips-2006-Fast Iterative Kernel PCA

14 0.36490735 35 nips-2006-Approximate inference using planar graph decomposition

15 0.36277172 86 nips-2006-Graph-Based Visual Saliency

16 0.3570666 153 nips-2006-Online Clustering of Moving Hyperplanes

17 0.35401186 65 nips-2006-Denoising and Dimension Reduction in Feature Space

18 0.34761289 77 nips-2006-Fast Computation of Graph Kernels

19 0.33980355 140 nips-2006-Multiple Instance Learning for Computer Aided Diagnosis

20 0.3372927 151 nips-2006-On the Relation Between Low Density Separation, Spectral Clustering and Graph Cuts


similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(1, 0.095), (3, 0.019), (7, 0.076), (9, 0.094), (20, 0.023), (22, 0.079), (44, 0.058), (57, 0.054), (65, 0.04), (69, 0.038), (72, 0.328)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.8202095 70 nips-2006-Doubly Stochastic Normalization for Spectral Clustering

Author: Ron Zass, Amnon Shashua

Abstract: In this paper we focus on the issue of normalization of the affinity matrix in spectral clustering. We show that the difference between N-cuts and Ratio-cuts is in the error measure being used (relative-entropy versus L1 norm) in finding the closest doubly-stochastic matrix to the input affinity matrix. We then develop a scheme for finding the optimal, under Frobenius norm, doubly-stochastic approximation using Von-Neumann’s successive projections lemma. The new normalization scheme is simple and efficient and provides superior clustering performance over many of the standardized tests. 1

2 0.64719355 48 nips-2006-Branch and Bound for Semi-Supervised Support Vector Machines

Author: Olivier Chapelle, Vikas Sindhwani, S. S. Keerthi

Abstract: Semi-supervised SVMs (S3 VM) attempt to learn low-density separators by maximizing the margin over labeled and unlabeled examples. The associated optimization problem is non-convex. To examine the full potential of S3 VMs modulo local minima problems in current implementations, we apply branch and bound techniques for obtaining exact, globally optimal solutions. Empirical evidence suggests that the globally optimal solution can return excellent generalization performance in situations where other implementations fail completely. While our current implementation is only applicable to small datasets, we discuss variants that can potentially lead to practically useful algorithms. 1

3 0.52030617 80 nips-2006-Fundamental Limitations of Spectral Clustering

Author: Boaz Nadler, Meirav Galun

Abstract: Spectral clustering methods are common graph-based approaches to clustering of data. Spectral clustering algorithms typically start from local information encoded in a weighted graph on the data and cluster according to the global eigenvectors of the corresponding (normalized) similarity matrix. One contribution of this paper is to present fundamental limitations of this general local to global approach. We show that based only on local information, the normalized cut functional is not a suitable measure for the quality of clustering. Further, even with a suitable similarity measure, we show that the first few eigenvectors of such adjacency matrices cannot successfully cluster datasets that contain structures at different scales of size and density. Based on these findings, a second contribution of this paper is a novel diffusion based measure to evaluate the coherence of individual clusters. Our measure can be used in conjunction with any bottom-up graph-based clustering method, it is scale-free and can determine coherent clusters at all scales. We present both synthetic examples and real image segmentation problems where various spectral clustering algorithms fail. In contrast, using this coherence measure finds the expected clusters at all scales. Keywords: Clustering, kernels, learning theory. 1

4 0.5001232 83 nips-2006-Generalized Maximum Margin Clustering and Unsupervised Kernel Learning

Author: Hamed Valizadegan, Rong Jin

Abstract: Maximum margin clustering was proposed lately and has shown promising performance in recent studies [1, 2]. It extends the theory of support vector machine to unsupervised learning. Despite its good performance, there are three major problems with maximum margin clustering that question its efficiency for real-world applications. First, it is computationally expensive and difficult to scale to large-scale datasets because the number of parameters in maximum margin clustering is quadratic in the number of examples. Second, it requires data preprocessing to ensure that any clustering boundary will pass through the origins, which makes it unsuitable for clustering unbalanced dataset. Third, it is sensitive to the choice of kernel functions, and requires external procedure to determine the appropriate values for the parameters of kernel functions. In this paper, we propose “generalized maximum margin clustering” framework that addresses the above three problems simultaneously. The new framework generalizes the maximum margin clustering algorithm by allowing any clustering boundaries including those not passing through the origins. It significantly improves the computational efficiency by reducing the number of parameters. Furthermore, the new framework is able to automatically determine the appropriate kernel matrix without any labeled data. Finally, we show a formal connection between maximum margin clustering and spectral clustering. We demonstrate the efficiency of the generalized maximum margin clustering algorithm using both synthetic datasets and real datasets from the UCI repository. 1

5 0.49800059 158 nips-2006-PG-means: learning the number of clusters in data

Author: Yu Feng, Greg Hamerly

Abstract: We present a novel algorithm called PG-means which is able to learn the number of clusters in a classical Gaussian mixture model. Our method is robust and efficient; it uses statistical hypothesis tests on one-dimensional projections of the data and model to determine if the examples are well represented by the model. In so doing, we are applying a statistical test for the entire model at once, not just on a per-cluster basis. We show that our method works well in difficult cases such as non-Gaussian data, overlapping clusters, eccentric clusters, high dimension, and many true clusters. Further, our new method provides a much more stable estimate of the number of clusters than existing methods. 1

6 0.49178591 7 nips-2006-A Local Learning Approach for Clustering

7 0.48997676 167 nips-2006-Recursive ICA

8 0.48949644 76 nips-2006-Emergence of conjunctive visual features by quadratic independent component analysis

9 0.4883396 65 nips-2006-Denoising and Dimension Reduction in Feature Space

10 0.48367342 184 nips-2006-Stratification Learning: Detecting Mixed Density and Dimensionality in High Dimensional Point Clouds

11 0.48089159 87 nips-2006-Graph Laplacian Regularization for Large-Scale Semidefinite Programming

12 0.47892335 127 nips-2006-MLLE: Modified Locally Linear Embedding Using Multiple Weights

13 0.47886884 51 nips-2006-Clustering Under Prior Knowledge with Application to Image Segmentation

14 0.47706923 175 nips-2006-Simplifying Mixture Models through Function Approximation

15 0.476735 72 nips-2006-Efficient Learning of Sparse Representations with an Energy-Based Model

16 0.47479942 194 nips-2006-Towards a general independent subspace analysis

17 0.47379866 32 nips-2006-Analysis of Empirical Bayesian Methods for Neuroelectromagnetic Source Localization

18 0.47368908 117 nips-2006-Learning on Graph with Laplacian Regularization

19 0.47319609 3 nips-2006-A Complexity-Distortion Approach to Joint Pattern Alignment

20 0.47284994 67 nips-2006-Differential Entropic Clustering of Multivariate Gaussians