nips nips2011 nips2011-186 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Sivaraman Balakrishnan, Min Xu, Akshay Krishnamurthy, Aarti Singh
Abstract: Although spectral clustering has enjoyed considerable empirical success in machine learning, its theoretical properties are not yet fully developed. We analyze the performance of a spectral algorithm for hierarchical clustering and show that on a class of hierarchically structured similarity matrices, this algorithm can tolerate noise that grows with the number of data points while still perfectly recovering the hierarchical clusters with high probability. We additionally improve upon previous results for k-way spectral clustering to derive conditions under which spectral clustering makes no mistakes. Further, using minimax analysis, we derive tight upper and lower bounds for the clustering problem and compare the performance of spectral clustering to these information theoretic limits. We also present experiments on simulated and real world data illustrating our results. 1
Reference: text
sentIndex sentText sentNum sentScore
1 edu Abstract Although spectral clustering has enjoyed considerable empirical success in machine learning, its theoretical properties are not yet fully developed. [sent-3, score-0.785]
2 We additionally improve upon previous results for k-way spectral clustering to derive conditions under which spectral clustering makes no mistakes. [sent-5, score-1.502]
3 Further, using minimax analysis, we derive tight upper and lower bounds for the clustering problem and compare the performance of spectral clustering to these information theoretic limits. [sent-6, score-1.225]
4 Two popular forms of clustering are k-way, where an algorithm directly partitions the data into k disjoint sets, and hierarchical, where the algorithm organizes the data into a hierarchy of groups. [sent-9, score-0.443]
5 Popular algorithms for the k-way problem include k-means, spectral clustering, and density-based clustering, while agglomerative methods that merge clusters from the bottom up are popular for the latter problem. [sent-10, score-0.727]
6 Spectral clustering algorithms embed the data points by projection onto a few eigenvectors of (some form of) the graph Laplacian matrix and use this spectral embedding to find a clustering. [sent-11, score-1.069]
7 This technique has been shown to work on various arbitrarily shaped clusters and, in addition to being straightforward to implement, often outperforms traditional clustering algorithms such as the kmeans algorithm. [sent-12, score-0.522]
8 Real world data is inevitably corrupted by noise and it is of interest to study the robustness of spectral clustering algorithms. [sent-13, score-0.859]
9 Our main contributions are: • We leverage results from perturbation theory in a novel analysis of a spectral algorithm for hierarchical clustering to understand its behavior in the presence of noise. [sent-15, score-1.01]
10 We provide strong guarantees on its correctness; in particular, we show that the amount of noise spectral clustering tolerates can grow rapidly with the size of the smallest cluster we want to resolve. [sent-16, score-1.03]
11 • We sharpen existing results on k-way spectral clustering. [sent-17, score-0.419]
12 In contrast with earlier work, we provide precise error bounds through a careful characterization of a k-means style algorithm run on the spectral embedding of the data. [sent-18, score-0.544]
13 • We also address the issue of optimal noise thresholds via the use of minimax theory. [sent-19, score-0.216]
14 1 2 Related Work and Definitions There are several high-level justifications for the success of spectral clustering. [sent-21, score-0.453]
15 There has also been some theoretical work on spectral algorithms for cluster recovery in random graph models. [sent-28, score-0.59]
16 McSherry [9] studies the “cluster-structured” random graph model in which the probability of adding an edge can vary depending on the clusters the edge connects. [sent-29, score-0.245]
17 In this case, we can view the observed adjacency matrix as a random perturbation of a low rank “expected” adjacency matrix which encodes the cluster membership. [sent-31, score-0.515]
18 McSherry shows that one can recover the clusters from a low rank approximation of the observed (noisy) adjacency matrix. [sent-32, score-0.298]
19 More recently, Rohe et al [11] analyze spectral clustering in the stochastic block model (SBM), which is an example of a structured random graph. [sent-35, score-0.869]
20 They consider the high-dimensional scenario where the number of clusters k grows with the number of data points n and show that under certain assumptions the average number of mistakes made by spectral clustering ! [sent-36, score-1.056]
21 Our work on hierarchical clustering also has the same high-dimensional flavor since the number of clusters we resolve grows with n. [sent-38, score-0.712]
22 However, in the hierarchical clustering setting, errors made at the bottom level propogate up the tree and we need to make precise arguments to ensure that the total number of errors ! [sent-39, score-0.601]
23 We consider more general noise models and study the relation between errors in clustering and noise variance. [sent-42, score-0.578]
24 [10] study k-way clustering and show that the eigenvectors of the graph Laplacian are stable in 2-norm under small perturbations. [sent-45, score-0.495]
25 [7] study the misclustering rate of spectral clustering under the somewhat unnatural assumption that every coordinate of the Laplacian’s eigenvectors are perturbed by independent and identically distributed noise. [sent-48, score-1.003]
26 In contrast, we specify our noise model as an additive perturbation to the similarity matrix, making no direct assumptions on how this affects the spectrum of the Laplacian. [sent-49, score-0.449]
27 1 Definitions The clustering problem can be defined as follows: Given an (n ⇥ n) similarity matrix on n data points, find a set C of subsets of the points such that points belonging to the same subset have high similarity and points in different subsets have low similarity. [sent-52, score-0.896]
28 A binary hierarchical clustering T is a hierarchical clustering such that for each nonatomic Ck 2 T , there exists two proper subsets Ci , Cj 2 T with Ci \ Cj = ; and Ci [ Cj = Ck . [sent-54, score-0.891]
29 For a similarity function defined on the ✏-neighborhood graph (Bottom), this data set forms an ideal matrix. [sent-60, score-0.481]
30 For a suitably chosen similarity function, a data set consisting of clusters that lie on arbitrary manifolds with complex shapes can result in this ideal case. [sent-63, score-0.582]
31 As an example, in the two-moons data set in Figure 1(a), the popular technique of constructing a nearest neighbor graph and defining the distance between two points as the length of the longest edge on the shortest path between them results in an ideal similarity matrix. [sent-64, score-0.493]
32 Other non-Euclidean similarity metrics (for instance density based similarity metrics [12]) can also allow for non-parametric cluster shapes. [sent-65, score-0.41]
33 For such ideal similarity matrices, we can show that the spectral clustering algorithm will deterministically recover all clusters in the hierarchy (see Theorem 5 in the appendix). [sent-66, score-1.469]
34 However, since this ideal case does not hold in general, we focus on similarity matrices that can be decomposed into an ideal matrix and a high-variance noise term. [sent-67, score-0.865]
35 • A symmetric (n⇥n) matrix R is a perturbation matrix with parameter if (a) E(Rij ) = 0, 2 2 (b) the entries of R are subgaussian, that is E(exp(tRij )) exp( 2t ) and (c) for each row i, Ri1 , . [sent-70, score-0.301]
36 In the k-way case, we consider the following similarity matrix which is studied by Ng et. [sent-77, score-0.217]
37 Definition 3 W is a noisy k-Block Diagonal matrix if W , A + R where R is a perturbation matrix and A is an ideal matrix for the k-way problem. [sent-79, score-0.7]
38 An ideal matrix for the k-way problem has within-cluster similarities larger than 0 > 0 and between cluster similarities 0. [sent-80, score-0.625]
39 Finally, we define the combinatorial Laplacian matrix, which will be the focus of our spectral algorithm and our subsequent analysis. [sent-81, score-0.464]
40 Definition 4 The combinatorial Laplacian L of a matrix W is defined as L , D Pn a diagonal matrix with Dii , j=1 Wij . [sent-82, score-0.224]
41 W where D is We note that other analyses of spectral clustering have studied other Laplacian matrices, particularly, 1 1 the normalized Laplacians defined as Ln , D 1 L and Ln , D 2 LD 2 . [sent-83, score-0.751]
42 However as we show in Appendix E, the normalized Laplacian can mis-cluster points even for an ideal noiseless similarity matrix. [sent-84, score-0.438]
43 3 3 7 7 7 ···7 7 7 5 Algorithm 1 HS input (noisy) n ⇥ n similarity matrix W Compute Laplacian L = D W v2 smallest non-constant eigenvector of L C1 {i : v2 (i) 0}, C2 {j : v2 (j) < 0} C {C1 , C2 }[ HS (WC1 )[ HS (WC2 ) Figure 2: An ideal matrix and a noisy HBM. [sent-85, score-0.731]
44 Algorithm 2 K-WAY S PECTRAL input (noisy) n ⇥ n similarity matrix W , number of clusters k Compute Laplacian L = D W V (n ⇥ k) matrix with columns v1 , . [sent-87, score-0.477]
45 Both of these algorithms take a similarity matrix W and compute the eigenvectors corresponding to the smallest eigenvalues of the Laplacian of W . [sent-111, score-0.38]
46 The algorithms then run simple procedures to recover the clustering from the spectral embedding of the data points by these eigenvectors. [sent-112, score-0.895]
47 We first state the following general assumptions, which we place on the ideal similarity matrix A: Assumption 1 For all i, j, 0 < Aij ⇤ for some constant ⇤ . [sent-116, score-0.462]
48 Assumption 2 (Balanced clusters) There is a constant ⌘ 1 such that at every split of the hierarchy |Cmax | |Cmin | ⌘, where |Cmax |, |Cmin | are the sizes of the biggest and smallest clusters respectively. [sent-117, score-0.378]
49 It is important to note that these assumptions are placed only on the ideal matrices. [sent-119, score-0.278]
50 For the ideal matrix, the Assumption 3 ensures that at every level of the hierarchy, the gap between the within-cluster similarities and between-cluster similarities is larger than the range of between-cluster similarities. [sent-124, score-0.439]
51 Earlier papers [9, 11] assume that the ideal similarities are constant within a block in which case the assumption is trivially satisfied by the definition of the ideal matrix. [sent-125, score-0.665]
52 If this assumption is violated by the ideal matrix, then the eigenvector entries can decay as fast as O(1/n) (see Appendix E for more details), and our analysis shows that such matrices will no longer be robust to noise. [sent-127, score-0.39]
53 Other analyses of spectral clustering often directly make less interpretable assumptions about the spectrum. [sent-128, score-0.784]
54 [10] assume conditions on the eigengap of the normalized Laplacian and this assumption implicitly creates constraints on the entries of the ideal matrix A that can be hard to make explicit. [sent-130, score-0.35]
55 Intuitively, how close the ideal matrix comes to violating Assumption 3 over a set of clusters S. [sent-132, score-0.505]
56 We, as well as previous works [10, 11], rely on results from perturbation theory to bound the error in the observed eigenvectors in 2-norm. [sent-134, score-0.309]
57 We are now ready to state our main result for hierarchical spectral clustering. [sent-138, score-0.517]
58 At a high level, this result gives conditions on the noise scale factor under which Algorithm HS will recover all clusters s 2 Sm , where Sm is the set of all clusters of size at least m. [sent-139, score-0.547]
59 Then for all n large enough, with probability at least 1 6/n, HS , on input M , will exactly recover all clusters of size at least m. [sent-147, score-0.249]
60 It is impossible to resolve the entire hierarchy, since small clusters can be irrecoverably buried in noise. [sent-149, score-0.246]
61 The amount of noise that algorithm HS can tolerate is directly dependent on the size of the smallest cluster we want to resolve. [sent-150, score-0.349]
62 As a consequence of our proof, we show that to resolve only the first level of the hierarchy, p the amount of noise we can tolerate is (pessimistically) o(? [sent-152, score-0.234]
63 Under this scaling between n and , it can be shown that popular agglomerative algorithms such as single linkage will fail with high probability. [sent-155, score-0.253]
64 We see that in our analysis determinant of the noise tolerance of spectral clustering. [sent-163, score-0.527]
65 Some arguments are more subtle since spectral clustering uses the subspace spanned by the k smallest eigenvectors of the Laplacian. [sent-166, score-0.987]
66 [10] to provide a coordinate-wise bound on the perturbation of the subspace, and use this to make precise guarantees for Algorithm K-WAY S PECTRAL. [sent-169, score-0.233]
67 1 Information-Theoretic Limits Having introduced our analysis for spectral clustering a pertinent question remains. [sent-174, score-0.751]
68 We establish the minimax rate in the simplest setting of a single binary split and compare it to our own results on spectral clustering. [sent-176, score-0.626]
69 We derive lower bounds on the problem of correctly identifying two clusters under the assumption that the clusters are balanced. [sent-178, score-0.415]
70 1 an = 0 b 5 Theorem 3 There exists a constant ↵ 2 (0, 1/8) such that if, q n ↵ log( n ) 2 the probability of failure of any estimator of the clustering remains bounded away from 0 as n ! [sent-184, score-0.362]
71 As a direct consequence of Theorem 1, spectral ✓ ◆ q n q n clustering requires min 5 C log n , 4 4 C log n (for a large enough constant C). [sent-188, score-0.819]
72 (2) (2) Thus, the noise threshold for spectral clustering does not match the lower bound. [sent-189, score-0.859]
73 This algorithm is strongly related to spectral clustering with the combinatorial Laplacian, which solves a relaxation of the balanced minimum cut problem. [sent-192, score-0.865]
74 This theorem and the lower bound together establish the minimax rate. [sent-196, score-0.24]
75 It however, remains an open problem to tighten the analysis of spectral clustering in this paper to match this rate. [sent-197, score-0.785]
76 In the Appendix we modify the analysis of [9] to show that under the added restriction of block constant ideal similarities there is an efficient algorithm that achieves the minimax rate. [sent-198, score-0.524]
77 Once we prove that we can recover the first split correctly, we can then recursively apply the same arguments along with some delicate union bounds to prove that we will recover all large-enough splits of the hierarchy. [sent-201, score-0.214]
78 We first show that the unperturbed v (2) can clearly distinguish the two outermost clusters and that 1 , 2 , and 3 (the first, second, and third smallest eigenvalues of LW respectively), are far away (2) 1 from each other. [sent-207, score-0.275]
79 r p ✓ ◆ n log n log n (2) (2) ||v u ||2 = O =O min( 2 , 3 n 2) The most straightforward way of turning this l2 -norm bound into uniform-entry-wise l1 bound is to assume that only one coordinate has large perturbation and comprises all of the l2 -perturbation. [sent-214, score-0.309]
80 n (2) 1 Combining this and the fact that |vi | = ⇥( pn ), and performing careful comparison with the leading constants, we can conclude that spectral clustering will correctly recover the first split. [sent-217, score-0.934]
81 One potential complication we need to resolve is that the k-Block Diagonal matrix has repeated eigenvalues and more careful subspace perturbation arguments are warranted. [sent-219, score-0.414]
82 Comparison of clustering algorithms with n = 512, m = 9 (c), and on simulated phylogeny data (d). [sent-233, score-0.406]
83 The ideal (noiseless) matrix can be taken to be block-constant since the worst case is when the diagonal blocks are at their lower bound (which we call p) and the off diagonal blocks are at their upper bound (q). [sent-241, score-0.473]
84 Our experiments focus on the effect of noise on spectral clustering in comparison with agglomerative methods such as single, average, and complete linkage. [sent-252, score-0.977]
85 For a range of scale factors and noisy HBMs of varying size, we empirically compute the probability with which spectral clustering recovers the first split of the hierarchy. [sent-255, score-0.891]
86 From the probability of success curves (Figure 3(a)), we can conclude that spectral clustering can tolerate noise that grows with the size of the clusters. [sent-256, score-0.999]
87 This shows that empirically, at least for the first split, spectral clustering appears to achieve the minimax rate for the problem. [sent-259, score-0.859]
88 2 Simulations We compare spectral clustering to several agglomerative methods on two forms of synthetic data: noisy HBMs and simulated phylogenetic data. [sent-261, score-1.079]
89 In Figure 3(c), we compare performance, as measured by the triplets metric, of four clustering algorithms (HS , and single, average, and complete linkage) with n = 512 and m = 9. [sent-265, score-0.379]
90 From these experiments, it is clear that HS consistently outperforms agglomerative methods, with tremendous improvements in the high-noise setting where it recovers a significant amount of the tree structure while agglomerative methods do not. [sent-273, score-0.275]
91 3 Real-World Data We apply hierarchical clustering methods to a yeast gene expression data set and one phylogenetic data set from the PFAM database [5]. [sent-275, score-0.577]
92 To evaluate our methods, we use a -entropy metric defined as follows: Given a permutation ⇡ and a similarity matrix W , we compute the rate of decay off of Pn d the diagonal as sd , n 1 d i=1 W⇡(i),⇡(i+d) , for d 2 {1, . [sent-276, score-0.256]
93 A good clustering will have a large amount of the probability ˆ mass concentrated at a few of the p⇡ (i)s, thus yielding a high E (⇡). [sent-282, score-0.332]
94 We first compare HS to single linkage on yeast gene expression data from DeRisi et al [4]. [sent-284, score-0.268]
95 We sampled gene subsets of size n = 512, 1024, and 2048 and ran both algorithms on the reduced similarity matrix. [sent-286, score-0.232]
96 These scores quantitatively demonstrate that HS outperfoms single linkage and additionally, we believe the clustering produced by HS (Figure 4(a)) is qualitatively better than that of single linkage. [sent-288, score-0.467]
97 Using alignments of varying length, we generated similarity matrices and computed -entropy of clusterings produced by both HS and Single Linkage. [sent-291, score-0.243]
98 6 Discussion In this paper we have presented a new analysis of spectral clustering in the presence of noise and established tight information theoretic upper and lower bounds. [sent-293, score-0.893]
99 As our analysis of spectral clustering does not show that it is minimax-optimal it remains an open problem to further tighten, or establish the tightness of, our analysis, and to find a computationally efficient minimax procedure in the general case when similarities are not block constant. [sent-294, score-1.042]
100 Identifying conditions under which one can guarantee correctness for other forms of spectral clustering is another interesting direction. [sent-295, score-0.816]
wordName wordTfidf (topN-words)
[('spectral', 0.419), ('clustering', 0.332), ('hs', 0.299), ('ideal', 0.245), ('clusters', 0.19), ('laplacian', 0.179), ('perturbation', 0.161), ('similarity', 0.147), ('linkage', 0.135), ('agglomerative', 0.118), ('cluster', 0.116), ('cs', 0.116), ('cj', 0.114), ('noise', 0.108), ('eigenvectors', 0.108), ('minimax', 0.108), ('hierarchical', 0.098), ('similarities', 0.097), ('sm', 0.085), ('noisy', 0.084), ('hierarchy', 0.077), ('hbms', 0.077), ('pfam', 0.077), ('pn', 0.07), ('tolerate', 0.07), ('ci', 0.07), ('matrix', 0.07), ('lw', 0.067), ('rohe', 0.067), ('mcsherry', 0.067), ('lr', 0.063), ('eigenvector', 0.06), ('phylogenetic', 0.059), ('recover', 0.059), ('perturbed', 0.058), ('split', 0.056), ('resolve', 0.056), ('graph', 0.055), ('smallest', 0.055), ('gene', 0.054), ('careful', 0.054), ('derisi', 0.051), ('misclustering', 0.051), ('pectral', 0.051), ('phyclust', 0.051), ('matrices', 0.05), ('adjacency', 0.049), ('theorem', 0.049), ('triplets', 0.047), ('ng', 0.047), ('points', 0.046), ('appendix', 0.046), ('clusterings', 0.046), ('al', 0.045), ('combinatorial', 0.045), ('hbm', 0.045), ('santosh', 0.045), ('outline', 0.044), ('establish', 0.043), ('block', 0.043), ('vi', 0.041), ('von', 0.041), ('phylogeny', 0.041), ('cmin', 0.041), ('rij', 0.041), ('fano', 0.041), ('ll', 0.041), ('arguments', 0.04), ('bound', 0.04), ('tree', 0.039), ('embedding', 0.039), ('cmax', 0.039), ('ulrike', 0.039), ('diagonal', 0.039), ('verify', 0.037), ('balanced', 0.036), ('grows', 0.036), ('assumption', 0.035), ('forms', 0.034), ('success', 0.034), ('tight', 0.034), ('sl', 0.034), ('luxburg', 0.034), ('tighten', 0.034), ('yeast', 0.034), ('log', 0.034), ('proof', 0.033), ('simulated', 0.033), ('fraction', 0.033), ('subspace', 0.033), ('cut', 0.033), ('assumptions', 0.033), ('precise', 0.032), ('subsets', 0.031), ('restriction', 0.031), ('correctness', 0.031), ('analyze', 0.03), ('errors', 0.03), ('away', 0.03), ('bn', 0.03)]
simIndex simValue paperId paperTitle
same-paper 1 0.99999964 186 nips-2011-Noise Thresholds for Spectral Clustering
Author: Sivaraman Balakrishnan, Min Xu, Akshay Krishnamurthy, Aarti Singh
Abstract: Although spectral clustering has enjoyed considerable empirical success in machine learning, its theoretical properties are not yet fully developed. We analyze the performance of a spectral algorithm for hierarchical clustering and show that on a class of hierarchically structured similarity matrices, this algorithm can tolerate noise that grows with the number of data points while still perfectly recovering the hierarchical clusters with high probability. We additionally improve upon previous results for k-way spectral clustering to derive conditions under which spectral clustering makes no mistakes. Further, using minimax analysis, we derive tight upper and lower bounds for the clustering problem and compare the performance of spectral clustering to these information theoretic limits. We also present experiments on simulated and real world data illustrating our results. 1
2 0.40592444 54 nips-2011-Co-regularized Multi-view Spectral Clustering
Author: Abhishek Kumar, Piyush Rai, Hal Daume
Abstract: In many clustering problems, we have access to multiple views of the data each of which could be individually used for clustering. Exploiting information from multiple views, one can hope to find a clustering that is more accurate than the ones obtained using the individual views. Often these different views admit same underlying clustering of the data, so we can approach this problem by looking for clusterings that are consistent across the views, i.e., corresponding data points in each view should have same cluster membership. We propose a spectral clustering framework that achieves this goal by co-regularizing the clustering hypotheses, and propose two co-regularization schemes to accomplish this. Experimental comparisons with a number of baselines on two synthetic and three real-world datasets establish the efficacy of our proposed approaches.
3 0.17983538 198 nips-2011-On U-processes and clustering performance
Author: Stéphan J. Clémençcon
Abstract: Many clustering techniques aim at optimizing empirical criteria that are of the form of a U -statistic of degree two. Given a measure of dissimilarity between pairs of observations, the goal is to minimize the within cluster point scatter over a class of partitions of the feature space. It is the purpose of this paper to define a general statistical framework, relying on the theory of U -processes, for studying the performance of such clustering methods. In this setup, under adequate assumptions on the complexity of the subsets forming the partition candidates, the √ excess of clustering risk is proved to be of the order OP (1/ n). Based on recent results related to the tail behavior of degenerate U -processes, it is also shown how to establish tighter rate bounds. Model selection issues, related to the number of clusters forming the data partition in particular, are also considered. 1
4 0.14321703 263 nips-2011-Sparse Manifold Clustering and Embedding
Author: Ehsan Elhamifar, René Vidal
Abstract: We propose an algorithm called Sparse Manifold Clustering and Embedding (SMCE) for simultaneous clustering and dimensionality reduction of data lying in multiple nonlinear manifolds. Similar to most dimensionality reduction methods, SMCE finds a small neighborhood around each data point and connects each point to its neighbors with appropriate weights. The key difference is that SMCE finds both the neighbors and the weights automatically. This is done by solving a sparse optimization problem, which encourages selecting nearby points that lie in the same manifold and approximately span a low-dimensional affine subspace. The optimal solution encodes information that can be used for clustering and dimensionality reduction using spectral clustering and embedding. Moreover, the size of the optimal neighborhood of a data point, which can be different for different points, provides an estimate of the dimension of the manifold to which the point belongs. Experiments demonstrate that our method can effectively handle multiple manifolds that are very close to each other, manifolds with non-uniform sampling and holes, as well as estimate the intrinsic dimensions of the manifolds. 1 1.1
5 0.13989884 155 nips-2011-Learning to Agglomerate Superpixel Hierarchies
Author: Viren Jain, Srinivas C. Turaga, K Briggman, Moritz N. Helmstaedter, Winfried Denk, H. S. Seung
Abstract: An agglomerative clustering algorithm merges the most similar pair of clusters at every iteration. The function that evaluates similarity is traditionally handdesigned, but there has been recent interest in supervised or semisupervised settings in which ground-truth clustered data is available for training. Here we show how to train a similarity function by regarding it as the action-value function of a reinforcement learning problem. We apply this general method to segment images by clustering superpixels, an application that we call Learning to Agglomerate Superpixel Hierarchies (LASH). When applied to a challenging dataset of brain images from serial electron microscopy, LASH dramatically improved segmentation accuracy when clustering supervoxels generated by state of the boundary detection algorithms. The naive strategy of directly training only supervoxel similarities and applying single linkage clustering produced less improvement. 1
6 0.11801832 172 nips-2011-Minimax Localization of Structural Information in Large Noisy Matrices
7 0.11716174 119 nips-2011-Higher-Order Correlation Clustering for Image Segmentation
8 0.1156357 47 nips-2011-Beyond Spectral Clustering - Tight Relaxations of Balanced Graph Cuts
9 0.10955969 213 nips-2011-Phase transition in the family of p-resistances
10 0.10849398 66 nips-2011-Crowdclustering
11 0.1065561 43 nips-2011-Bayesian Partitioning of Large-Scale Distance Data
12 0.10108829 70 nips-2011-Dimensionality Reduction Using the Sparse Linear Model
13 0.0999377 96 nips-2011-Fast and Balanced: Efficient Label Tree Learning for Large Scale Object Recognition
14 0.099863179 71 nips-2011-Directed Graph Embedding: an Algorithm based on Continuous Limits of Laplacian-type Operators
15 0.096302956 118 nips-2011-High-dimensional regression with noisy and missing data: Provable guarantees with non-convexity
16 0.093389176 239 nips-2011-Robust Lasso with missing and grossly corrupted observations
17 0.093179666 140 nips-2011-Kernel Embeddings of Latent Tree Graphical Models
18 0.087593131 157 nips-2011-Learning to Search Efficiently in High Dimensions
19 0.087075554 267 nips-2011-Spectral Methods for Learning Multivariate Latent Tree Structure
20 0.084575489 248 nips-2011-Semi-supervised Regression via Parallel Field Regularization
topicId topicWeight
[(0, 0.241), (1, 0.027), (2, -0.094), (3, -0.103), (4, -0.068), (5, -0.028), (6, -0.104), (7, -0.013), (8, 0.168), (9, -0.11), (10, 0.014), (11, 0.357), (12, 0.117), (13, -0.256), (14, 0.072), (15, 0.035), (16, -0.075), (17, 0.01), (18, 0.067), (19, -0.063), (20, 0.037), (21, 0.046), (22, -0.156), (23, 0.071), (24, -0.016), (25, -0.041), (26, 0.007), (27, 0.166), (28, 0.133), (29, -0.028), (30, 0.024), (31, 0.084), (32, 0.086), (33, -0.029), (34, 0.002), (35, -0.044), (36, 0.032), (37, 0.026), (38, -0.005), (39, -0.071), (40, -0.012), (41, 0.047), (42, -0.056), (43, -0.086), (44, -0.003), (45, 0.028), (46, -0.009), (47, -0.074), (48, 0.073), (49, 0.03)]
simIndex simValue paperId paperTitle
same-paper 1 0.98015702 186 nips-2011-Noise Thresholds for Spectral Clustering
Author: Sivaraman Balakrishnan, Min Xu, Akshay Krishnamurthy, Aarti Singh
Abstract: Although spectral clustering has enjoyed considerable empirical success in machine learning, its theoretical properties are not yet fully developed. We analyze the performance of a spectral algorithm for hierarchical clustering and show that on a class of hierarchically structured similarity matrices, this algorithm can tolerate noise that grows with the number of data points while still perfectly recovering the hierarchical clusters with high probability. We additionally improve upon previous results for k-way spectral clustering to derive conditions under which spectral clustering makes no mistakes. Further, using minimax analysis, we derive tight upper and lower bounds for the clustering problem and compare the performance of spectral clustering to these information theoretic limits. We also present experiments on simulated and real world data illustrating our results. 1
2 0.91232669 54 nips-2011-Co-regularized Multi-view Spectral Clustering
Author: Abhishek Kumar, Piyush Rai, Hal Daume
Abstract: In many clustering problems, we have access to multiple views of the data each of which could be individually used for clustering. Exploiting information from multiple views, one can hope to find a clustering that is more accurate than the ones obtained using the individual views. Often these different views admit same underlying clustering of the data, so we can approach this problem by looking for clusterings that are consistent across the views, i.e., corresponding data points in each view should have same cluster membership. We propose a spectral clustering framework that achieves this goal by co-regularizing the clustering hypotheses, and propose two co-regularization schemes to accomplish this. Experimental comparisons with a number of baselines on two synthetic and three real-world datasets establish the efficacy of our proposed approaches.
3 0.75074106 198 nips-2011-On U-processes and clustering performance
Author: Stéphan J. Clémençcon
Abstract: Many clustering techniques aim at optimizing empirical criteria that are of the form of a U -statistic of degree two. Given a measure of dissimilarity between pairs of observations, the goal is to minimize the within cluster point scatter over a class of partitions of the feature space. It is the purpose of this paper to define a general statistical framework, relying on the theory of U -processes, for studying the performance of such clustering methods. In this setup, under adequate assumptions on the complexity of the subsets forming the partition candidates, the √ excess of clustering risk is proved to be of the order OP (1/ n). Based on recent results related to the tail behavior of degenerate U -processes, it is also shown how to establish tighter rate bounds. Model selection issues, related to the number of clusters forming the data partition in particular, are also considered. 1
4 0.63295376 263 nips-2011-Sparse Manifold Clustering and Embedding
Author: Ehsan Elhamifar, René Vidal
Abstract: We propose an algorithm called Sparse Manifold Clustering and Embedding (SMCE) for simultaneous clustering and dimensionality reduction of data lying in multiple nonlinear manifolds. Similar to most dimensionality reduction methods, SMCE finds a small neighborhood around each data point and connects each point to its neighbors with appropriate weights. The key difference is that SMCE finds both the neighbors and the weights automatically. This is done by solving a sparse optimization problem, which encourages selecting nearby points that lie in the same manifold and approximately span a low-dimensional affine subspace. The optimal solution encodes information that can be used for clustering and dimensionality reduction using spectral clustering and embedding. Moreover, the size of the optimal neighborhood of a data point, which can be different for different points, provides an estimate of the dimension of the manifold to which the point belongs. Experiments demonstrate that our method can effectively handle multiple manifolds that are very close to each other, manifolds with non-uniform sampling and holes, as well as estimate the intrinsic dimensions of the manifolds. 1 1.1
5 0.61889231 155 nips-2011-Learning to Agglomerate Superpixel Hierarchies
Author: Viren Jain, Srinivas C. Turaga, K Briggman, Moritz N. Helmstaedter, Winfried Denk, H. S. Seung
Abstract: An agglomerative clustering algorithm merges the most similar pair of clusters at every iteration. The function that evaluates similarity is traditionally handdesigned, but there has been recent interest in supervised or semisupervised settings in which ground-truth clustered data is available for training. Here we show how to train a similarity function by regarding it as the action-value function of a reinforcement learning problem. We apply this general method to segment images by clustering superpixels, an application that we call Learning to Agglomerate Superpixel Hierarchies (LASH). When applied to a challenging dataset of brain images from serial electron microscopy, LASH dramatically improved segmentation accuracy when clustering supervoxels generated by state of the boundary detection algorithms. The naive strategy of directly training only supervoxel similarities and applying single linkage clustering produced less improvement. 1
6 0.58000636 172 nips-2011-Minimax Localization of Structural Information in Large Noisy Matrices
7 0.55953306 236 nips-2011-Regularized Laplacian Estimation and Fast Eigenvector Approximation
8 0.52310681 43 nips-2011-Bayesian Partitioning of Large-Scale Distance Data
9 0.51632041 95 nips-2011-Fast and Accurate k-means For Large Datasets
10 0.5123145 47 nips-2011-Beyond Spectral Clustering - Tight Relaxations of Balanced Graph Cuts
11 0.50611186 52 nips-2011-Clustering via Dirichlet Process Mixture Models for Portable Skill Discovery
12 0.50407618 241 nips-2011-Scalable Training of Mixture Models via Coresets
13 0.49469709 119 nips-2011-Higher-Order Correlation Clustering for Image Segmentation
14 0.47541323 164 nips-2011-Manifold Precis: An Annealing Technique for Diverse Sampling of Manifolds
15 0.46700874 71 nips-2011-Directed Graph Embedding: an Algorithm based on Continuous Limits of Laplacian-type Operators
16 0.45030963 120 nips-2011-History distribution matching method for predicting effectiveness of HIV combination therapies
17 0.43241078 66 nips-2011-Crowdclustering
18 0.43042648 213 nips-2011-Phase transition in the family of p-resistances
19 0.42386609 51 nips-2011-Clustered Multi-Task Learning Via Alternating Structure Optimization
20 0.40536705 5 nips-2011-A Denoising View of Matrix Completion
topicId topicWeight
[(0, 0.053), (4, 0.058), (20, 0.044), (26, 0.031), (31, 0.063), (33, 0.04), (43, 0.075), (45, 0.11), (57, 0.035), (65, 0.021), (67, 0.117), (74, 0.101), (83, 0.047), (99, 0.11)]
simIndex simValue paperId paperTitle
same-paper 1 0.89278764 186 nips-2011-Noise Thresholds for Spectral Clustering
Author: Sivaraman Balakrishnan, Min Xu, Akshay Krishnamurthy, Aarti Singh
Abstract: Although spectral clustering has enjoyed considerable empirical success in machine learning, its theoretical properties are not yet fully developed. We analyze the performance of a spectral algorithm for hierarchical clustering and show that on a class of hierarchically structured similarity matrices, this algorithm can tolerate noise that grows with the number of data points while still perfectly recovering the hierarchical clusters with high probability. We additionally improve upon previous results for k-way spectral clustering to derive conditions under which spectral clustering makes no mistakes. Further, using minimax analysis, we derive tight upper and lower bounds for the clustering problem and compare the performance of spectral clustering to these information theoretic limits. We also present experiments on simulated and real world data illustrating our results. 1
2 0.86966383 147 nips-2011-Learning Patient-Specific Cancer Survival Distributions as a Sequence of Dependent Regressors
Author: Hsiu-chin Lin, Vickie Baracos, Russell Greiner, Chun-nam J. Yu
Abstract: An accurate model of patient survival time can help in the treatment and care of cancer patients. The common practice of providing survival time estimates based only on population averages for the site and stage of cancer ignores many important individual differences among patients. In this paper, we propose a local regression method for learning patient-specific survival time distribution based on patient attributes such as blood tests and clinical assessments. When tested on a cohort of more than 2000 cancer patients, our method gives survival time predictions that are much more accurate than popular survival analysis models such as the Cox and Aalen regression models. Our results also show that using patient-specific attributes can reduce the prediction error on survival time by as much as 20% when compared to using cancer site and stage only. 1
3 0.85771072 129 nips-2011-Improving Topic Coherence with Regularized Topic Models
Author: David Newman, Edwin V. Bonilla, Wray Buntine
Abstract: Topic models have the potential to improve search and browsing by extracting useful semantic themes from web pages and other text documents. When learned topics are coherent and interpretable, they can be valuable for faceted browsing, results set diversity analysis, and document retrieval. However, when dealing with small collections or noisy text (e.g. web search result snippets or blog posts), learned topics can be less coherent, less interpretable, and less useful. To overcome this, we propose two methods to regularize the learning of topic models. Our regularizers work by creating a structured prior over words that reflect broad patterns in the external data. Using thirteen datasets we show that both regularizers improve topic coherence and interpretability while learning a faithful representation of the collection of interest. Overall, this work makes topic models more useful across a broader range of text data. 1
4 0.85425693 227 nips-2011-Pylon Model for Semantic Segmentation
Author: Victor Lempitsky, Andrea Vedaldi, Andrew Zisserman
Abstract: Graph cut optimization is one of the standard workhorses of image segmentation since for binary random field representations of the image, it gives globally optimal results and there are efficient polynomial time implementations. Often, the random field is applied over a flat partitioning of the image into non-intersecting elements, such as pixels or super-pixels. In the paper we show that if, instead of a flat partitioning, the image is represented by a hierarchical segmentation tree, then the resulting energy combining unary and boundary terms can still be optimized using graph cut (with all the corresponding benefits of global optimality and efficiency). As a result of such inference, the image gets partitioned into a set of segments that may come from different layers of the tree. We apply this formulation, which we call the pylon model, to the task of semantic segmentation where the goal is to separate an image into areas belonging to different semantic classes. The experiments highlight the advantage of inference on a segmentation tree (over a flat partitioning) and demonstrate that the optimization in the pylon model is able to flexibly choose the level of segmentation across the image. Overall, the proposed system has superior segmentation accuracy on several datasets (Graz-02, Stanford background) compared to previously suggested approaches. 1
5 0.84455854 56 nips-2011-Committing Bandits
Author: Loc X. Bui, Ramesh Johari, Shie Mannor
Abstract: We consider a multi-armed bandit problem where there are two phases. The first phase is an experimentation phase where the decision maker is free to explore multiple options. In the second phase the decision maker has to commit to one of the arms and stick with it. Cost is incurred during both phases with a higher cost during the experimentation phase. We analyze the regret in this setup, and both propose algorithms and provide upper and lower bounds that depend on the ratio of the duration of the experimentation phase to the duration of the commitment phase. Our analysis reveals that if given the choice, it is optimal to experiment Θ(ln T ) steps and then commit, where T is the time horizon.
6 0.81933087 263 nips-2011-Sparse Manifold Clustering and Embedding
7 0.81312466 270 nips-2011-Statistical Performance of Convex Tensor Decomposition
8 0.80901235 276 nips-2011-Structured sparse coding via lateral inhibition
9 0.80855668 102 nips-2011-Generalised Coupled Tensor Factorisation
10 0.8047924 71 nips-2011-Directed Graph Embedding: an Algorithm based on Continuous Limits of Laplacian-type Operators
11 0.80227059 29 nips-2011-Algorithms and hardness results for parallel large margin learning
12 0.80115294 258 nips-2011-Sparse Bayesian Multi-Task Learning
13 0.80084884 204 nips-2011-Online Learning: Stochastic, Constrained, and Smoothed Adversaries
14 0.79805243 236 nips-2011-Regularized Laplacian Estimation and Fast Eigenvector Approximation
15 0.7967304 159 nips-2011-Learning with the weighted trace-norm under arbitrary sampling distributions
16 0.79577976 144 nips-2011-Learning Auto-regressive Models from Sequence and Non-sequence Data
17 0.7947346 265 nips-2011-Sparse recovery by thresholded non-negative least squares
18 0.79454541 198 nips-2011-On U-processes and clustering performance
19 0.79386526 66 nips-2011-Crowdclustering
20 0.79203987 303 nips-2011-Video Annotation and Tracking with Active Learning