nips nips2006 nips2006-80 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Boaz Nadler, Meirav Galun
Abstract: Spectral clustering methods are common graph-based approaches to clustering of data. Spectral clustering algorithms typically start from local information encoded in a weighted graph on the data and cluster according to the global eigenvectors of the corresponding (normalized) similarity matrix. One contribution of this paper is to present fundamental limitations of this general local to global approach. We show that based only on local information, the normalized cut functional is not a suitable measure for the quality of clustering. Further, even with a suitable similarity measure, we show that the first few eigenvectors of such adjacency matrices cannot successfully cluster datasets that contain structures at different scales of size and density. Based on these findings, a second contribution of this paper is a novel diffusion based measure to evaluate the coherence of individual clusters. Our measure can be used in conjunction with any bottom-up graph-based clustering method, it is scale-free and can determine coherent clusters at all scales. We present both synthetic examples and real image segmentation problems where various spectral clustering algorithms fail. In contrast, using this coherence measure finds the expected clusters at all scales. Keywords: Clustering, kernels, learning theory. 1
Reference: text
sentIndex sentText sentNum sentScore
1 il Abstract Spectral clustering methods are common graph-based approaches to clustering of data. [sent-5, score-0.668]
2 Spectral clustering algorithms typically start from local information encoded in a weighted graph on the data and cluster according to the global eigenvectors of the corresponding (normalized) similarity matrix. [sent-6, score-0.889]
3 We show that based only on local information, the normalized cut functional is not a suitable measure for the quality of clustering. [sent-8, score-0.597]
4 Further, even with a suitable similarity measure, we show that the first few eigenvectors of such adjacency matrices cannot successfully cluster datasets that contain structures at different scales of size and density. [sent-9, score-0.473]
5 Based on these findings, a second contribution of this paper is a novel diffusion based measure to evaluate the coherence of individual clusters. [sent-10, score-0.402]
6 Our measure can be used in conjunction with any bottom-up graph-based clustering method, it is scale-free and can determine coherent clusters at all scales. [sent-11, score-0.783]
7 We present both synthetic examples and real image segmentation problems where various spectral clustering algorithms fail. [sent-12, score-0.788]
8 In contrast, using this coherence measure finds the expected clusters at all scales. [sent-13, score-0.561]
9 1 Introduction Spectral clustering methods are common graph-based approaches to (unsupervised) clustering of data. [sent-15, score-0.668]
10 Given a dataset of n points {xi }n ⊂ Rp , these methods first construct a weighted graph i=1 G = (V, W ), where the n points are the set of nodes V and the weighted edges Wi,j are computed by some local symmetric and non-negative similarity measure. [sent-16, score-0.292]
11 A common choice is a Gaussian kernel with width σ, where · denotes the standard Euclidean metric in Rp Wi,j = exp − xi − xj 2σ 2 2 (1) In this framework, clustering is translated into a graph partitioning problem. [sent-17, score-0.543]
12 Two main spectral approaches for graph partitioning have been suggested. [sent-18, score-0.447]
13 The first is to construct a normalized cut (conductance) functional to measure the quality of a partition of the graph nodes V into k clusters[1, 2]. [sent-19, score-0.693]
14 While extensions of this functional to more than two clusters are possible, both works suggest a recursive top-down approach where additional clusters are found by ∗ Corresponding author. [sent-21, score-0.646]
15 il/∼nadler minimizing the same clustering functional on each of the two subgraphs. [sent-26, score-0.411]
16 This approximation leads to clustering according to the eigenvector with second largest eigenvalue of the normalized graph Laplacian, W y = λDy. [sent-30, score-0.614]
17 We note that there are also graph partitioning algorithms based on a non-normalized functional leading to clustering according to the second eigenvector of the standard graph Laplacian matrix D − W, also known as the Fiedler vector [4]. [sent-31, score-0.747]
18 Belkin and Niyogi [8] showed that for data uniformly sampled from a manifold, these eigenvectors approximate the eigenfunctions of the Laplace Beltrami operator, which give an optimal low dimensional embedding under a certain criterion. [sent-34, score-0.31]
19 Optimality of these eigenvectors, including rotations, was derived in [9] for multiclass spectral clustering. [sent-35, score-0.286]
20 Probabilistic interpretations, based on the fact that these eigenvectors correspond to a random walk on the graph were also given by several authors [11]-[15]. [sent-36, score-0.388]
21 Limitations of spectral clustering in the presence of background noise and multiscale data were noted in [10, 16], with suggestions to replace the uniform σ 2 in eq. [sent-37, score-0.759]
22 The aim of this paper is to present fundamental limitations of spectral clustering methods, and propose a novel diffusion based coherence measure to evaluate the internal consistency of individual clusters. [sent-39, score-1.138]
23 First, in Section 2 we show that based on the isotropic local similarity measure (1), the NP-hard normalized cut criterion may not be a suitable global functional for data clustering. [sent-40, score-0.695]
24 Further, in Section 3 we show that spectral clustering suffers from additional limitations, even with a suitable similarity measure. [sent-42, score-0.693]
25 Our theoretical analysis is based on the probabilistic interpretation of spectral clustering as a random walk on the graph and on the intimate connection between the corresponding eigenvalues and eigenvectors and the characteristic relaxation times and processes of this random walk. [sent-43, score-1.41]
26 We show that similar to Fourier analysis, spectral clustering methods are global in nature. [sent-44, score-0.62]
27 Therefore, even with a location dependent σ(x) as in [10], these methods typically fail to simultaneously identify clusters at different scales. [sent-45, score-0.3]
28 Based on this analysis, we present in Section 4 simple examples where spectral clustering fails. [sent-46, score-0.647]
29 We conclude with Section 5, where we propose a novel diffusion based coherence measure. [sent-47, score-0.323]
30 This quantity measures the coherence of a set of points as all belonging to a single cluster, by comparing the relaxation times on the set and on its suggested partition. [sent-48, score-0.483]
31 As such, it can be used in conjunction with either top-down or bottom-up clustering approaches and may overcome some of their limitations. [sent-50, score-0.395]
32 We show how use of this measure correctly clusters the examples of Section 4, where spectral clustering fails. [sent-51, score-1.021]
33 2 Unsuitability of normalized cut functional with local information As reported in the literature, clustering by approximate minimization of the functional (2) performs well in many cases. [sent-52, score-0.904]
34 However, a theoretical question still remains: Under what circumstances is this functional indeed a good measure for the quality of clustering ? [sent-53, score-0.52]
35 Recall that the basic goal of clustering is to group together highly similar points while setting apart dissimilar ones. [sent-54, score-0.369]
36 Therefore, the question can be rephrased - is local information sufficient for global clustering ? [sent-56, score-0.365]
37 We construct a simple example where local information is insufficient for correct clustering according to the functional (2). [sent-58, score-0.442]
38 05 Original Data 3 3 2 2 1 1 0 0 −1 −1 −2 −2 2 4 (a) 6 2 4 (b) 6 Figure 1: A dataset with two clusters and result of normalized cut algorithm [2]. [sent-60, score-0.669]
39 Clearly, the two clusters are the Gaussian ball and the rectangular strip Ω. [sent-68, score-0.483]
40 1(b), clustering based on the second eigenvector of the normalized graph Laplacian with weights Wi,j given by (1) partitions the points somewhere along the long strip instead of between the strip and the Gaussian ball. [sent-70, score-0.973]
41 Intuitively, the failure of the normalized cut criterion is clear. [sent-72, score-0.385]
42 Since the overlap between the Gaussian ball and the rectangular strip is larger than the width of the strip, a cut that separates them has a higher penalty than a cut somewhere along the thin strip. [sent-73, score-0.924]
43 To show this mathematically, we consider the penalty of the cut due to the numerator in (2) in the limit of a large number of points n → ∞. [sent-74, score-0.368]
44 Therefore, if L ρ and µ2 /ρ = O(1) the horizontal cut between the ball and the strip has larger normalized penalty than a vertical cut of the strip. [sent-79, score-0.875]
45 Other spectral clustering algorithms that use two eigenvectors, including those that take a local scale into account, also fail to separate the ball from the strip and yield similar results to fig. [sent-82, score-0.88]
46 A common approach that avoids this decision problem is to directly find three clusters by using the first three eigenvectors of W v = λDv. [sent-90, score-0.468]
47 Specifically, denote by {λj , v j } the set of eigenvectors of W v = λDv with eigenvalues sorted in decreasing order, and denote by v j (xi ) the i-th entry (corresponding to the point xi ) in the j-th eigenvector v j . [sent-91, score-0.404]
48 , v k (xi )) ∈ Rk , and apply simple ˜ clustering algorithms to the points Ψ(xi ) [8, 9, 12]. [sent-95, score-0.369]
49 We now show that spectral clustering that uses the first k eigenvectors for finding k clusters also suffers from fundamental limitations. [sent-97, score-1.119]
50 Therefore, the eigenvalues and eigenvectors {λj , v j } for j > 1, capture the characteristic relaxation times and processes of the random walk on the graph towards equilibrium. [sent-102, score-0.79]
51 (7) shows that these eigenfunctions and eigenvalues capture the leading characteristic relaxation processes and time scales of the SDE (8). [sent-114, score-0.506]
52 These have been studied extensively in the literature [20], and can give insight into the success and limitations of spectral clustering [13]. [sent-115, score-0.705]
53 For example, if Ω = Rp and the density p(x) consists of k highly separated Gaussian clusters of roughly equal size (k clusters), then there are exactly k eigenvalues very close or equal to zero, and their corresponding eigenfunctions are approximately piecewise constant in each of these clusters. [sent-116, score-0.558]
54 Therefore, in this setting spectral clustering with k eigenvectors works very well. [sent-117, score-0.863]
55 To understand the limitations of spectral clustering, we now explicitly analyze situations with clusters at different scales of size and density. [sent-118, score-0.664]
56 The corresponding eigenfunction ψ2 is approximately piecewise constant inside the large well and inside the two smaller wells with a sharp transition near the saddle point xmax . [sent-127, score-0.578]
57 This eigenfunction easily separates cluster 1 from clusters 2 and 3 (see top center panel in fig. [sent-128, score-0.583]
58 A second characteristic time is τ2,3 , the mean first passage time between clusters 2 and 3, also given by a formula similar to (10). [sent-130, score-0.419]
59 If the potential barrier between these two wells is much smaller than between wells 1 and 2, then τ2,3 τ1,2 . [sent-131, score-0.29]
60 A third characteristic time is the equilibration time inside cluster 1. [sent-132, score-0.416]
61 To compute it we consider a diffusion process only inside cluster 1, e. [sent-133, score-0.347]
62 This eigenfunction cannot separate between clusters 2 and 3. [sent-139, score-0.359]
63 This analysis shows that when confronted with clusters of different scales, corresponding to a multiscale landscape potential, standard spectral clustering which uses the first k eigenvectors to find k clusters will fail. [sent-145, score-1.517]
64 The fact that spectral clustering with a single scale σ may fail to correctly cluster multiscale data was already noted in [10, 16]. [sent-147, score-0.993]
65 However, if the large cluster has a high density (comparable to the density of the small clusters), this approach is not able to overcome the limitations of spectral clustering, see fig. [sent-153, score-0.712]
66 Moreover, this approach may also fail in the case of uniform density clusters defined solely by geometry (see fig. [sent-155, score-0.408]
67 Specifically, we consider one large cluster with σ1 = 2 centered at x1 = (−6, 0), and two smaller clusters with σ2 = σ3 = 0. [sent-159, score-0.44]
68 2, n = 1000 random points from this density clearly show the difference in scales between the large cluster and the smaller ones. [sent-164, score-0.34]
69 The second eigenvector ψ2 is indeed approximately piecewise constant and easily separates the larger cluster from the smaller ones. [sent-166, score-0.393]
70 However, ψ3 and ψ4 are constant on the smaller clusters, capturing the relaxation process in the larger cluster (ψ3 captures relaxation along the y-direction, hence it is not a function of the x-coordinate). [sent-167, score-0.565]
71 In this example the container is so large that the relaxation time inside it is slower than the characteristic time to diffuse between the small disks, hence NJW algorithm fails to cluster correctly. [sent-186, score-0.701]
72 Note that spectral clustering with the eigenvectors of the standard graph Laplacian has similar limitations, since the Euclidean distance between these eigenvectors is equal to the mean commute time on the graph [11]. [sent-189, score-1.274]
73 5 Clustering with a Relaxation Time Coherence Measure The analysis and examples of Sections 3 and 4 may suggest the use of more than k eigenvectors in spectral clustering. [sent-191, score-0.526]
74 However, clustering with k-means using 5 eigenvectors on the examples of Section 4 produced unsatisfactory results (not shown). [sent-192, score-0.574]
75 ZP k Original Data NN 10 5 5 0 =7 10 0 −5 −5 −10 −10 −10 −5 0 5 −10 10 −5 0 5 10 Figure 4: Three clusters defined solely by geometry, and result of ZP clustering (Example III). [sent-195, score-0.589]
76 Original Image Coherence Measure Ncut with 4 clusters (a) (b) (c) Figure 5: Normalized cut and coherence measure segmentation on a synthetic image. [sent-196, score-0.947]
77 Given the importance of relaxation times on the graph as indication of clusters, we propose a novel and principled measure for the coherence of a set of points as belonging to a single cluster. [sent-198, score-0.676]
78 Our coherence measure can be used in conjunction with any clustering algorithm. [sent-199, score-0.673]
79 Specifically, let G = (V, W ) be a weighted graph of points and let V = S ∪ (V \ S) be a possible partition (computed by some clustering algorithm). [sent-200, score-0.521]
80 We define τV = 1/(1 − λ2 ) as the characteristic relaxation time of this graph. [sent-203, score-0.289]
81 Similarly, τ1 and τ2 denote the characteristic relaxation times of the two subgraphs corresponding to the partitions S and V \ S. [sent-204, score-0.395]
82 If, however, V consists of two weakly connected clusters defined by S and V \ S, then τ1 and τ2 measure the characteristic relaxation times inside these two clusters while τV measures the overall relaxation time. [sent-206, score-1.201]
83 We note that other works have also considered relaxation times for clustering with different approaches [21, 22]. [sent-214, score-0.585]
84 We now present use of this coherence measure with normalized cut clustering on the third example of Section 4. [sent-215, score-1.025]
85 The first partition of normalized cut on this data with σ = 1 separates between the large container and the two smaller disks. [sent-216, score-0.644]
86 The relaxation times of the full graph and the two subgraphs are (τV , τ1 , τ2 ) = (1350, 294, 360). [sent-217, score-0.381]
87 Normalized cuts partitions the container roughly into two parts with (τV , τ1 , τ2 ) = (294, 130, 135), which according to our coherence measure means that the big container is a single structure that should not be split. [sent-220, score-0.599]
88 Finally, normalized cut on the two small disks correctly separates them giving (τV , τ1 , τ2 ) = (360, 18, 28), which indicates that indeed the two disks should be split. [sent-221, score-0.75]
89 Thus, combination of our coherence measure with normalized cut not only clusters correctly, but also automatically finds the correct number of clusters, regardless of cluster scale. [sent-223, score-1.095]
90 The segmentation results of normalized cuts [24] and of the coherence measure combined with [23] appear in panels (b) and (c). [sent-228, score-0.561]
91 Each segments is Original Image Coherence Measure Ncut 6 clusters Ncut 20 clusters Figure 6: Normalized cut and coherence measure segmentation on a real image. [sent-231, score-1.202]
92 With a small number of clusters normalized cut cannot find the small coherent segments in the image, whereas with a large number of clusters, large objects are segmented. [sent-233, score-0.722]
93 Implementing our coherence measure with [23] finds salient clusters at different scales. [sent-234, score-0.561]
94 Laplacian eigenmaps and spectral techniques for embedding and clustering, NIPS Vol. [sent-274, score-0.286]
95 Dupont, The principal component analysis of a graph and its relationships to spectral clustering. [sent-288, score-0.4]
96 A random walks view of spectral segmentation, AI and Statistics, 2001. [sent-293, score-0.315]
97 Kevrekidis, Diffusion maps spectral clustering and eigenfunctions of Fokker-Planck operators, NIPS, 2005. [sent-300, score-0.717]
98 Poland, Amplifying the block matrix structure for spectral clustering, Proceedings of the 14th Annual Machine Learning Conference of Belgium and the Netherlands, pp. [sent-310, score-0.286]
99 Slonim, Data clustering by Markovian relaxation and the information bottleneck method, NIPS, 2000. [sent-335, score-0.524]
100 Jepson, Half-lives of eigenflows for spectral clustering, NIPS, 2002. [sent-339, score-0.286]
wordName wordTfidf (topN-words)
[('clustering', 0.334), ('zp', 0.307), ('spectral', 0.286), ('cut', 0.28), ('clusters', 0.255), ('coherence', 0.227), ('eigenvectors', 0.213), ('relaxation', 0.19), ('cluster', 0.149), ('multiscale', 0.139), ('njw', 0.132), ('strip', 0.13), ('graph', 0.114), ('container', 0.11), ('disks', 0.11), ('segmentation', 0.106), ('normalized', 0.105), ('eigenfunction', 0.104), ('inside', 0.102), ('characteristic', 0.099), ('eigenfunctions', 0.097), ('diffusion', 0.096), ('wells', 0.095), ('limitations', 0.085), ('density', 0.082), ('eigenvalues', 0.082), ('coherent', 0.082), ('measure', 0.079), ('functional', 0.077), ('separates', 0.075), ('equilibration', 0.066), ('ncut', 0.066), ('xmax', 0.066), ('passage', 0.065), ('rp', 0.064), ('walk', 0.061), ('eigenvector', 0.061), ('basri', 0.057), ('brandt', 0.057), ('sde', 0.057), ('ball', 0.054), ('sharon', 0.052), ('fails', 0.051), ('isotropic', 0.05), ('nadler', 0.049), ('xi', 0.048), ('similarity', 0.048), ('partitioning', 0.047), ('subgraphs', 0.046), ('laplacian', 0.046), ('fail', 0.045), ('rectangular', 0.044), ('cuts', 0.044), ('galun', 0.044), ('umax', 0.044), ('umin', 0.044), ('xmin', 0.044), ('iccv', 0.042), ('piecewise', 0.042), ('correctly', 0.04), ('partition', 0.038), ('kannan', 0.038), ('scales', 0.038), ('smaller', 0.036), ('image', 0.035), ('fourier', 0.035), ('vempala', 0.035), ('somewhere', 0.035), ('lafon', 0.035), ('barrier', 0.035), ('confronted', 0.035), ('points', 0.035), ('conjunction', 0.033), ('pg', 0.032), ('local', 0.031), ('times', 0.031), ('nips', 0.031), ('saddle', 0.031), ('pl', 0.031), ('clusterings', 0.031), ('fundamental', 0.031), ('works', 0.03), ('indeed', 0.03), ('potential', 0.029), ('recursive', 0.029), ('dataset', 0.029), ('narrow', 0.029), ('walks', 0.029), ('partitions', 0.029), ('aggregation', 0.028), ('knn', 0.028), ('overcome', 0.028), ('examples', 0.027), ('texture', 0.027), ('numerator', 0.027), ('geometry', 0.026), ('penalty', 0.026), ('israel', 0.026), ('suitable', 0.025), ('belkin', 0.025)]
simIndex simValue paperId paperTitle
same-paper 1 0.99999934 80 nips-2006-Fundamental Limitations of Spectral Clustering
Author: Boaz Nadler, Meirav Galun
Abstract: Spectral clustering methods are common graph-based approaches to clustering of data. Spectral clustering algorithms typically start from local information encoded in a weighted graph on the data and cluster according to the global eigenvectors of the corresponding (normalized) similarity matrix. One contribution of this paper is to present fundamental limitations of this general local to global approach. We show that based only on local information, the normalized cut functional is not a suitable measure for the quality of clustering. Further, even with a suitable similarity measure, we show that the first few eigenvectors of such adjacency matrices cannot successfully cluster datasets that contain structures at different scales of size and density. Based on these findings, a second contribution of this paper is a novel diffusion based measure to evaluate the coherence of individual clusters. Our measure can be used in conjunction with any bottom-up graph-based clustering method, it is scale-free and can determine coherent clusters at all scales. We present both synthetic examples and real image segmentation problems where various spectral clustering algorithms fail. In contrast, using this coherence measure finds the expected clusters at all scales. Keywords: Clustering, kernels, learning theory. 1
2 0.31899336 83 nips-2006-Generalized Maximum Margin Clustering and Unsupervised Kernel Learning
Author: Hamed Valizadegan, Rong Jin
Abstract: Maximum margin clustering was proposed lately and has shown promising performance in recent studies [1, 2]. It extends the theory of support vector machine to unsupervised learning. Despite its good performance, there are three major problems with maximum margin clustering that question its efficiency for real-world applications. First, it is computationally expensive and difficult to scale to large-scale datasets because the number of parameters in maximum margin clustering is quadratic in the number of examples. Second, it requires data preprocessing to ensure that any clustering boundary will pass through the origins, which makes it unsuitable for clustering unbalanced dataset. Third, it is sensitive to the choice of kernel functions, and requires external procedure to determine the appropriate values for the parameters of kernel functions. In this paper, we propose “generalized maximum margin clustering” framework that addresses the above three problems simultaneously. The new framework generalizes the maximum margin clustering algorithm by allowing any clustering boundaries including those not passing through the origins. It significantly improves the computational efficiency by reducing the number of parameters. Furthermore, the new framework is able to automatically determine the appropriate kernel matrix without any labeled data. Finally, we show a formal connection between maximum margin clustering and spectral clustering. We demonstrate the efficiency of the generalized maximum margin clustering algorithm using both synthetic datasets and real datasets from the UCI repository. 1
3 0.30034989 151 nips-2006-On the Relation Between Low Density Separation, Spectral Clustering and Graph Cuts
Author: Hariharan Narayanan, Mikhail Belkin, Partha Niyogi
Abstract: One of the intuitions underlying many graph-based methods for clustering and semi-supervised learning, is that class or cluster boundaries pass through areas of low probability density. In this paper we provide some formal analysis of that notion for a probability distribution. We introduce a notion of weighted boundary volume, which measures the length of the class/cluster boundary weighted by the density of the underlying probability distribution. We show that sizes of the cuts of certain commonly used data adjacency graphs converge to this continuous weighted volume of the boundary. keywords: Clustering, Semi-Supervised Learning 1
4 0.29265624 7 nips-2006-A Local Learning Approach for Clustering
Author: Mingrui Wu, Bernhard Schölkopf
Abstract: We present a local learning approach for clustering. The basic idea is that a good clustering result should have the property that the cluster label of each data point can be well predicted based on its neighboring data and their cluster labels, using current supervised learning methods. An optimization problem is formulated such that its solution has the above property. Relaxation and eigen-decomposition are applied to solve this optimization problem. We also briefly investigate the parameter selection issue and provide a simple parameter selection method for the proposed algorithm. Experimental results are provided to validate the effectiveness of the proposed approach. 1
5 0.17323683 67 nips-2006-Differential Entropic Clustering of Multivariate Gaussians
Author: Jason V. Davis, Inderjit S. Dhillon
Abstract: Gaussian data is pervasive and many learning algorithms (e.g., k-means) model their inputs as a single sample drawn from a multivariate Gaussian. However, in many real-life settings, each input object is best described by multiple samples drawn from a multivariate Gaussian. Such data can arise, for example, in a movie review database where each movie is rated by several users, or in time-series domains such as sensor networks. Here, each input can be naturally described by both a mean vector and covariance matrix which parameterize the Gaussian distribution. In this paper, we consider the problem of clustering such input objects, each represented as a multivariate Gaussian. We formulate the problem using an information theoretic approach and draw several interesting theoretical connections to Bregman divergences and also Bregman matrix divergences. We evaluate our method across several domains, including synthetic data, sensor network data, and a statistical debugging application. 1
6 0.17030445 158 nips-2006-PG-means: learning the number of clusters in data
7 0.16266666 60 nips-2006-Convergence of Laplacian Eigenmaps
8 0.15674938 100 nips-2006-Information Bottleneck for Non Co-Occurrence Data
9 0.15517956 123 nips-2006-Learning with Hypergraphs: Clustering, Classification, and Embedding
10 0.15037647 70 nips-2006-Doubly Stochastic Normalization for Spectral Clustering
11 0.12409122 51 nips-2006-Clustering Under Prior Knowledge with Application to Image Segmentation
12 0.12023971 128 nips-2006-Manifold Denoising
13 0.11735474 102 nips-2006-Kernel Maximum Entropy Data Transformation and an Enhanced Spectral Clustering Algorithm
14 0.1145426 39 nips-2006-Balanced Graph Matching
15 0.1107703 184 nips-2006-Stratification Learning: Detecting Mixed Density and Dimensionality in High Dimensional Point Clouds
16 0.10332007 65 nips-2006-Denoising and Dimension Reduction in Feature Space
17 0.10137653 117 nips-2006-Learning on Graph with Laplacian Regularization
18 0.096918702 181 nips-2006-Stability of $K$-Means Clustering
19 0.096449934 87 nips-2006-Graph Laplacian Regularization for Large-Scale Semidefinite Programming
20 0.091350794 175 nips-2006-Simplifying Mixture Models through Function Approximation
topicId topicWeight
[(0, -0.253), (1, 0.159), (2, 0.05), (3, 0.413), (4, 0.12), (5, -0.229), (6, 0.059), (7, 0.142), (8, 0.222), (9, -0.02), (10, 0.138), (11, 0.145), (12, -0.1), (13, 0.154), (14, -0.043), (15, 0.058), (16, 0.031), (17, 0.037), (18, 0.057), (19, -0.041), (20, -0.06), (21, 0.003), (22, -0.025), (23, 0.005), (24, 0.047), (25, 0.037), (26, -0.023), (27, 0.061), (28, 0.049), (29, -0.048), (30, -0.025), (31, -0.004), (32, -0.017), (33, 0.053), (34, -0.073), (35, 0.089), (36, -0.031), (37, 0.011), (38, 0.004), (39, -0.021), (40, 0.072), (41, -0.027), (42, -0.022), (43, 0.044), (44, 0.026), (45, -0.032), (46, 0.015), (47, -0.038), (48, -0.015), (49, -0.041)]
simIndex simValue paperId paperTitle
same-paper 1 0.98354071 80 nips-2006-Fundamental Limitations of Spectral Clustering
Author: Boaz Nadler, Meirav Galun
Abstract: Spectral clustering methods are common graph-based approaches to clustering of data. Spectral clustering algorithms typically start from local information encoded in a weighted graph on the data and cluster according to the global eigenvectors of the corresponding (normalized) similarity matrix. One contribution of this paper is to present fundamental limitations of this general local to global approach. We show that based only on local information, the normalized cut functional is not a suitable measure for the quality of clustering. Further, even with a suitable similarity measure, we show that the first few eigenvectors of such adjacency matrices cannot successfully cluster datasets that contain structures at different scales of size and density. Based on these findings, a second contribution of this paper is a novel diffusion based measure to evaluate the coherence of individual clusters. Our measure can be used in conjunction with any bottom-up graph-based clustering method, it is scale-free and can determine coherent clusters at all scales. We present both synthetic examples and real image segmentation problems where various spectral clustering algorithms fail. In contrast, using this coherence measure finds the expected clusters at all scales. Keywords: Clustering, kernels, learning theory. 1
2 0.80750656 7 nips-2006-A Local Learning Approach for Clustering
Author: Mingrui Wu, Bernhard Schölkopf
Abstract: We present a local learning approach for clustering. The basic idea is that a good clustering result should have the property that the cluster label of each data point can be well predicted based on its neighboring data and their cluster labels, using current supervised learning methods. An optimization problem is formulated such that its solution has the above property. Relaxation and eigen-decomposition are applied to solve this optimization problem. We also briefly investigate the parameter selection issue and provide a simple parameter selection method for the proposed algorithm. Experimental results are provided to validate the effectiveness of the proposed approach. 1
3 0.74428141 83 nips-2006-Generalized Maximum Margin Clustering and Unsupervised Kernel Learning
Author: Hamed Valizadegan, Rong Jin
Abstract: Maximum margin clustering was proposed lately and has shown promising performance in recent studies [1, 2]. It extends the theory of support vector machine to unsupervised learning. Despite its good performance, there are three major problems with maximum margin clustering that question its efficiency for real-world applications. First, it is computationally expensive and difficult to scale to large-scale datasets because the number of parameters in maximum margin clustering is quadratic in the number of examples. Second, it requires data preprocessing to ensure that any clustering boundary will pass through the origins, which makes it unsuitable for clustering unbalanced dataset. Third, it is sensitive to the choice of kernel functions, and requires external procedure to determine the appropriate values for the parameters of kernel functions. In this paper, we propose “generalized maximum margin clustering” framework that addresses the above three problems simultaneously. The new framework generalizes the maximum margin clustering algorithm by allowing any clustering boundaries including those not passing through the origins. It significantly improves the computational efficiency by reducing the number of parameters. Furthermore, the new framework is able to automatically determine the appropriate kernel matrix without any labeled data. Finally, we show a formal connection between maximum margin clustering and spectral clustering. We demonstrate the efficiency of the generalized maximum margin clustering algorithm using both synthetic datasets and real datasets from the UCI repository. 1
4 0.72068012 151 nips-2006-On the Relation Between Low Density Separation, Spectral Clustering and Graph Cuts
Author: Hariharan Narayanan, Mikhail Belkin, Partha Niyogi
Abstract: One of the intuitions underlying many graph-based methods for clustering and semi-supervised learning, is that class or cluster boundaries pass through areas of low probability density. In this paper we provide some formal analysis of that notion for a probability distribution. We introduce a notion of weighted boundary volume, which measures the length of the class/cluster boundary weighted by the density of the underlying probability distribution. We show that sizes of the cuts of certain commonly used data adjacency graphs converge to this continuous weighted volume of the boundary. keywords: Clustering, Semi-Supervised Learning 1
5 0.7151584 70 nips-2006-Doubly Stochastic Normalization for Spectral Clustering
Author: Ron Zass, Amnon Shashua
Abstract: In this paper we focus on the issue of normalization of the affinity matrix in spectral clustering. We show that the difference between N-cuts and Ratio-cuts is in the error measure being used (relative-entropy versus L1 norm) in finding the closest doubly-stochastic matrix to the input affinity matrix. We then develop a scheme for finding the optimal, under Frobenius norm, doubly-stochastic approximation using Von-Neumann’s successive projections lemma. The new normalization scheme is simple and efficient and provides superior clustering performance over many of the standardized tests. 1
6 0.71226442 123 nips-2006-Learning with Hypergraphs: Clustering, Classification, and Embedding
7 0.62841117 67 nips-2006-Differential Entropic Clustering of Multivariate Gaussians
8 0.52373731 100 nips-2006-Information Bottleneck for Non Co-Occurrence Data
9 0.45541066 158 nips-2006-PG-means: learning the number of clusters in data
10 0.42972746 128 nips-2006-Manifold Denoising
11 0.42697221 60 nips-2006-Convergence of Laplacian Eigenmaps
12 0.39096832 181 nips-2006-Stability of $K$-Means Clustering
13 0.38646469 184 nips-2006-Stratification Learning: Detecting Mixed Density and Dimensionality in High Dimensional Point Clouds
14 0.37723258 117 nips-2006-Learning on Graph with Laplacian Regularization
15 0.36458722 102 nips-2006-Kernel Maximum Entropy Data Transformation and an Enhanced Spectral Clustering Algorithm
16 0.35340625 79 nips-2006-Fast Iterative Kernel PCA
17 0.34438986 39 nips-2006-Balanced Graph Matching
18 0.33553708 175 nips-2006-Simplifying Mixture Models through Function Approximation
19 0.3288292 51 nips-2006-Clustering Under Prior Knowledge with Application to Image Segmentation
20 0.31599733 87 nips-2006-Graph Laplacian Regularization for Large-Scale Semidefinite Programming
topicId topicWeight
[(0, 0.019), (1, 0.074), (3, 0.028), (7, 0.148), (9, 0.101), (12, 0.037), (20, 0.018), (22, 0.039), (30, 0.113), (44, 0.056), (57, 0.062), (65, 0.066), (69, 0.051), (71, 0.014), (72, 0.049), (83, 0.016), (90, 0.014)]
simIndex simValue paperId paperTitle
same-paper 1 0.90537572 80 nips-2006-Fundamental Limitations of Spectral Clustering
Author: Boaz Nadler, Meirav Galun
Abstract: Spectral clustering methods are common graph-based approaches to clustering of data. Spectral clustering algorithms typically start from local information encoded in a weighted graph on the data and cluster according to the global eigenvectors of the corresponding (normalized) similarity matrix. One contribution of this paper is to present fundamental limitations of this general local to global approach. We show that based only on local information, the normalized cut functional is not a suitable measure for the quality of clustering. Further, even with a suitable similarity measure, we show that the first few eigenvectors of such adjacency matrices cannot successfully cluster datasets that contain structures at different scales of size and density. Based on these findings, a second contribution of this paper is a novel diffusion based measure to evaluate the coherence of individual clusters. Our measure can be used in conjunction with any bottom-up graph-based clustering method, it is scale-free and can determine coherent clusters at all scales. We present both synthetic examples and real image segmentation problems where various spectral clustering algorithms fail. In contrast, using this coherence measure finds the expected clusters at all scales. Keywords: Clustering, kernels, learning theory. 1
2 0.84019327 189 nips-2006-Temporal dynamics of information content carried by neurons in the primary visual cortex
Author: Danko Nikolić, Stefan Haeusler, Wolf Singer, Wolfgang Maass
Abstract: We use multi-electrode recordings from cat primary visual cortex and investigate whether a simple linear classifier can extract information about the presented stimuli. We find that information is extractable and that it even lasts for several hundred milliseconds after the stimulus has been removed. In a fast sequence of stimulus presentation, information about both new and old stimuli is present simultaneously and nonlinear relations between these stimuli can be extracted. These results suggest nonlinear properties of cortical representations. The important implications of these properties for the nonlinear brain theory are discussed.
3 0.80097926 190 nips-2006-The Neurodynamics of Belief Propagation on Binary Markov Random Fields
Author: Thomas Ott, Ruedi Stoop
Abstract: We rigorously establish a close relationship between message passing algorithms and models of neurodynamics by showing that the equations of a continuous Hopfield network can be derived from the equations of belief propagation on a binary Markov random field. As Hopfield networks are equipped with a Lyapunov function, convergence is guaranteed. As a consequence, in the limit of many weak connections per neuron, Hopfield networks exactly implement a continuous-time variant of belief propagation starting from message initialisations that prevent from running into convergence problems. Our results lead to a better understanding of the role of message passing algorithms in real biological neural networks.
4 0.80020094 175 nips-2006-Simplifying Mixture Models through Function Approximation
Author: Kai Zhang, James T. Kwok
Abstract: Finite mixture model is a powerful tool in many statistical learning problems. In this paper, we propose a general, structure-preserving approach to reduce its model complexity, which can bring significant computational benefits in many applications. The basic idea is to group the original mixture components into compact clusters, and then minimize an upper bound on the approximation error between the original and simplified models. By adopting the L2 norm as the distance measure between mixture models, we can derive closed-form solutions that are more robust and reliable than using the KL-based distance measure. Moreover, the complexity of our algorithm is only linear in the sample size and dimensionality. Experiments on density estimation and clustering-based image segmentation demonstrate its outstanding performance in terms of both speed and accuracy.
5 0.79644573 70 nips-2006-Doubly Stochastic Normalization for Spectral Clustering
Author: Ron Zass, Amnon Shashua
Abstract: In this paper we focus on the issue of normalization of the affinity matrix in spectral clustering. We show that the difference between N-cuts and Ratio-cuts is in the error measure being used (relative-entropy versus L1 norm) in finding the closest doubly-stochastic matrix to the input affinity matrix. We then develop a scheme for finding the optimal, under Frobenius norm, doubly-stochastic approximation using Von-Neumann’s successive projections lemma. The new normalization scheme is simple and efficient and provides superior clustering performance over many of the standardized tests. 1
6 0.79265064 60 nips-2006-Convergence of Laplacian Eigenmaps
7 0.77950829 184 nips-2006-Stratification Learning: Detecting Mixed Density and Dimensionality in High Dimensional Point Clouds
8 0.77874488 158 nips-2006-PG-means: learning the number of clusters in data
9 0.77019471 167 nips-2006-Recursive ICA
10 0.76937521 48 nips-2006-Branch and Bound for Semi-Supervised Support Vector Machines
11 0.7667737 119 nips-2006-Learning to Rank with Nonsmooth Cost Functions
12 0.76547337 39 nips-2006-Balanced Graph Matching
13 0.76260942 163 nips-2006-Prediction on a Graph with a Perceptron
14 0.76202029 128 nips-2006-Manifold Denoising
15 0.76170766 7 nips-2006-A Local Learning Approach for Clustering
16 0.7602008 117 nips-2006-Learning on Graph with Laplacian Regularization
17 0.75978988 33 nips-2006-Analysis of Representations for Domain Adaptation
18 0.75963807 83 nips-2006-Generalized Maximum Margin Clustering and Unsupervised Kernel Learning
19 0.75913918 76 nips-2006-Emergence of conjunctive visual features by quadratic independent component analysis
20 0.75876009 129 nips-2006-Map-Reduce for Machine Learning on Multicore