nips nips2012 nips2012-69 knowledge-graph by maker-knowledge-mining

69 nips-2012-Clustering Sparse Graphs


Source: pdf

Author: Yudong Chen, Sujay Sanghavi, Huan Xu

Abstract: We develop a new algorithm to cluster sparse unweighted graphs – i.e. partition the nodes into disjoint clusters so that there is higher density within clusters, and low across clusters. By sparsity we mean the setting where both the in-cluster and across cluster edge densities are very small, possibly vanishing in the size of the graph. Sparsity makes the problem noisier, and hence more difficult to solve. Any clustering involves a tradeoff between minimizing two kinds of errors: missing edges within clusters and present edges across clusters. Our insight is that in the sparse case, these must be penalized differently. We analyze our algorithm’s performance on the natural, classical and widely studied “planted partition” model (also called the stochastic block model); we show that our algorithm can cluster sparser graphs, and with smaller clusters, than all previous methods. This is seen empirically as well. 1

Reference: text


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 sg Abstract We develop a new algorithm to cluster sparse unweighted graphs – i. [sent-6, score-0.526]

2 partition the nodes into disjoint clusters so that there is higher density within clusters, and low across clusters. [sent-8, score-0.575]

3 By sparsity we mean the setting where both the in-cluster and across cluster edge densities are very small, possibly vanishing in the size of the graph. [sent-9, score-0.51]

4 Any clustering involves a tradeoff between minimizing two kinds of errors: missing edges within clusters and present edges across clusters. [sent-11, score-0.779]

5 Our insight is that in the sparse case, these must be penalized differently. [sent-12, score-0.145]

6 We analyze our algorithm’s performance on the natural, classical and widely studied “planted partition” model (also called the stochastic block model); we show that our algorithm can cluster sparser graphs, and with smaller clusters, than all previous methods. [sent-13, score-0.382]

7 1 Introduction This paper proposes a new algorithm for the following task: given a sparse undirected unweighted graph, partition the nodes into disjoint clusters so that the density of edges within clusters is higher than the edges across clusters. [sent-15, score-1.165]

8 In particular, we are interested in settings where even within clusters the edge density is low, and the density across clusters is an additive (or small multiplicative) constant lower. [sent-16, score-0.731]

9 Several large modern datasets and graphs are sparse; examples include the web graph, social graphs of various social networks, etc. [sent-17, score-0.252]

10 More generally, there are several clustering applications where one is given as input a set of similarity relationships, but this set is quite sparse. [sent-19, score-0.308]

11 Unweighted sparse graph clustering corresponds to a special case in which all similarities are either “1” or “0”. [sent-20, score-0.516]

12 Just for intuition, imagine a random graph where every edge has a (potentially different) probability pij (which can be reflective of an underlying clustering structure) of appearing in the graph. [sent-22, score-0.715]

13 Consider now the edge random variable, which is 1 if there is an edge, and 0 else. [sent-23, score-0.141]

14 Then, in the sparse graph √ setting of small pij → 0, the mean of this variable is pij but its standard deviation is pij , which 1 can be much larger. [sent-24, score-0.478]

15 Another parameter governing problem difficulty is the size of the clusters; smaller clusters are easier to lose in the noise. [sent-26, score-0.286]

16 Our contribution: We propose a new algorithm for sparse unweighted graph clustering. [sent-27, score-0.324]

17 errors) between the given graph and any candidate clustering: missing edges within clusters, and present edges across clusters. [sent-30, score-0.33]

18 Our key realization is that for sparse graph clustering, these two types of error should be penalized differently. [sent-31, score-0.24]

19 Doing so gives as a combinatorial optimization problem; our algorithm is a particular convex relaxation of the same, based on the fact that the cluster matrix is low-rank (we elaborate below). [sent-32, score-0.508]

20 Our main analytical result in this paper is theoretical guarantees on its performance for the classical planted partition model [10], also called the stochastic block-model [1, 22], for random clustered graphs. [sent-33, score-0.692]

21 , [4, 7, 10, 20]), we show that our algorithm out-performs (upto at most log factors) every existing method in this setting (i. [sent-36, score-0.102]

22 it recovers the true clustering for a bigger range of sparsity and cluster sizes). [sent-38, score-0.611]

23 Both the level of sparsity and the number and sizes of the clusters are allowed to be functions of n, the total number of nodes. [sent-39, score-0.291]

24 Our simulation study confirms our theoretic finding, that the proposed method is effective in clustering sparse graphs and outperforms existing methods. [sent-41, score-0.536]

25 1 Related Work The general field of clustering, or even graph clustering, is too vast for a detailed survey here; we focus on the most related threads, and therein too primarily on work which provides theoretical “cluster recovery” guarantees on the resulting algorithms. [sent-45, score-0.177]

26 Correlation clustering: As mentioned above, every candidate clustering will have two kinds of errors; correlation clustering [2] weighs them equally, thus the objective is to find the clustering which minimizes just the total number of errors. [sent-46, score-1.091]

27 Subsequently, there has been much work on devising alternative approximation algorithms for both the weighted and unweighted cases, and for both agreement and disagreement objectives [12, 13, 3, 9]. [sent-48, score-0.161]

28 Approximations based on LP relaxation [11] and SDP relaxation [25, 19], followed by rounding, have also been developed. [sent-49, score-0.122]

29 We emphasize that while we do convex relaxation as well, we do not do rounding; rather, our convex program itself yields an optimal clustering. [sent-51, score-0.211]

30 Planted partition model / Stochastic block model: This is a natural and classic model for studying graph clustering in the average case, and is also the setting for our performance guarantees. [sent-52, score-0.599]

31 Sparse and low-rank matrix decomposition: It has recently been shown [8, 6] that, under certain conditions, it is possible to recover a low-rank matrix from sparse errors of arbitrary magnitude; this has even been applied to graph clustering [17]. [sent-54, score-0.741]

32 Our algorithm turns out to be a weighted version of sparse and low-rank matrix decomposition, with different elements of the sparse part penalized differently, based on the given input. [sent-55, score-0.272]

33 2 Algorithm Idea: Our algorithm is a convex relaxation of a natural combinatorial objective for the sparse clustering problem. [sent-57, score-0.602]

34 a partition of the nodes) such that in-cluster connectiv2 ity is denser than across-cluster connectivity. [sent-61, score-0.12]

35 Said differently, we want a clustering that has a small number of errors, where an error is either (a) an edge between two nodes in different clusters, or (b) a missing edge between two nodes in the same cluster. [sent-62, score-0.794]

36 The correlation clustering setup [2] gives equal weights to the two types of errors. [sent-64, score-0.375]

37 However, for sparse graphs, this will yield clusters with a very small number of nodes. [sent-65, score-0.314]

38 This is because there is sparsity both within clusters and across clusters; grouping nodes in the same cluster will result in a lot of errors of type (b) above, without yielding corresponding gains in errors of type (a) – even when they may actually be in the same cluster. [sent-66, score-0.909]

39 This can be very easily seen: suppose, for example, the “true” clustering has two clusters with equal size, and the in-cluster and across-cluster edge density are both less than 1/4. [sent-67, score-0.728]

40 Then, when both errors are weighted equally, the clustering which puts every node in a separate cluster will have lower cost than the true clustering. [sent-68, score-0.726]

41 In particular, sparsity means that we can expect many more errors of type (b) in any solution, and hence we should give this (potentially much) smaller weight than errors of type (a). [sent-70, score-0.275]

42 Our crucial insight is that we can know what kind of error will (potentially) occur on any given edge from the given adjacency matrix itself. [sent-71, score-0.294]

43 In particular, if aij = 1 for some pair i, j, when in any clustering it will either have no error, or an error of type (a); it will never be an error of type (b). [sent-72, score-0.498]

44 Similarly if aij = 0 then it can only be an error of type (b), if at all. [sent-73, score-0.155]

45 Our algorithm is a convex relaxation of the combinatorial problem of finding the minimum cost clustering, with the cost for an error on edge i, j determined based on the value of aij . [sent-74, score-0.458]

46 Perhaps surprisingly, this simple idea yields better results than the extensive literature already in place for planted partitions. [sent-75, score-0.458]

47 We proceed by representing the given adjacency matrix A as the sum of two matrices A = Y + S, where we would like Y to be a cluster matrix, with yij = 1 if and only if i, j are in the same cluster, and 0 otherwise12 . [sent-76, score-0.475]

48 We now make a cost matrix C ∈ Rn×n based on the insight above; we choose two values cA and cAc and set cij = cA if the corresponding aij = 1, and cij = cAc if aij = 0. [sent-78, score-0.454]

49 t C ◦S (1) 1 Y +S =A Y is a cluster matrix Here C ◦ S denotes the matrix obtained via element-wise product between the two matrices C, S, i. [sent-81, score-0.366]

50 Algorithm: Our algorithm involves solving a convex relaxation of this combinatorial objective, by replacing the “Y is a cluster matrix” constraint with (i) constraints 0 ≤ yij ≤ 1 for all elements i, j, and (ii) a nuclear norm3 penalty Y ∗ in the objective. [sent-87, score-0.59]

51 The latter encourages Y to be low-rank, and is based on the well-established insight that the cluster matrix (being a block-diagonal collection of 1’s) is low-rank. [sent-88, score-0.354]

52 Y ∗ + C ◦S 1 0 ≤ yij ≤ 1, ∀i, j Y + S = A, (2) (3) Once Y is obtained, check if it is a cluster matrix (say e. [sent-91, score-0.42]

53 via an SVD, which will also reveal cluster membership if it is). [sent-93, score-0.256]

54 Our theoretical results provide sufficient conditions under which the optimum of the convex program is integral and a clustering, with no rounding required. [sent-95, score-0.293]

55 Section 3 in the supplementary material provides details on fast implementation for large matrices; this is one reason 1 In this paper we will assume the convention that aii = 1 and yii = 1 for all nodes i. [sent-96, score-0.131]

56 In other words, Y is the adjacency matrix of a graph consisting of disjoint cliques. [sent-97, score-0.29]

57 Comments: Based on the given A and these values, the optimal Y may or may not be a cluster matrix. [sent-102, score-0.256]

58 If Y is a cluster matrix, then clearly it minimizes the combinatorial objective above. [sent-103, score-0.367]

59 Additionally, it is not hard to see (proof in the supplementary material) that its performance is “monotone”, in the sense that adding edges “aligned with” Y cannot result in a different optimum, as summarized in the following lemma. [sent-104, score-0.109]

60 This shows that, in the terminology of [19, 4, 14], our method is robust under a classical semi-random model where an adversary can add edge within clusters and remove edges between clusters. [sent-105, score-0.5]

61 Suppose now we arbitrarily change some edges of A to obtain A, by (a) choosing some edges such that yij = 1 but aij = 0, and making aij = 1, and (b) choosing some edges where yij = 0 but aij = 1, and making ai j = 0. [sent-108, score-0.818]

62 Our theoretical guarantees characterize when the optimal Y will be a cluster matrix, and recover the clustering, in a natural classical problem setting called the planted partition model [10]. [sent-110, score-0.944]

63 3 Performance Guarantees In this section we provide analytical performance guarantees for our algorithm under a natural and classical graph clustering setting: (a generalization of) the planted partition model [10]. [sent-112, score-1.138]

64 (Generalized) Planted partition model: Consider a random graph generated as follows: the n nodes are partitioned into r disjoint clusters, which we will refer to as the “true” clusters. [sent-114, score-0.402]

65 For every pair of nodes i, j that belong to the same cluster, edge (i, j) is present in the graph with probability that is at least p, while for every pair where the nodes are in ¯ different clusters the edge is present with probability at most q . [sent-116, score-0.944]

66 We call this model the “generalized” ¯ planted partition because we allow for clusters to be different sizes, and the edge probabilities also to be different (but uniformly bounded as mentioned). [sent-117, score-0.963]

67 The objective is to find the partition, given the random graph generated from it. [sent-118, score-0.165]

68 Recall that A is the given adjacency matrix, and let Y ∗ be the matrix corresponding to the true ∗ clusters as above – i. [sent-119, score-0.354]

69 yij = 1 if and only if i, j are in the same true cluster, and 0 otherwise. [sent-121, score-0.109]

70 Our result below establishes conditions under which our algorithm, specifically the convex program (2)-(3), yields this Y ∗ as the unique optimum (without any further need for rounding etc. [sent-123, score-0.293]

71 Our theorem quantifies the tradeoff between the two quantities governing the hardness of a planted partition problem – the difference in edge densities p−q, and the minimum cluster size K – required for our algorithm to succeed, i. [sent-135, score-1.051]

72 This allows us to guarantee recovery for much sparser graphs than all existing results. [sent-146, score-0.201]

73 If all the clusters have equal size K, it is easy to verify that the first eigenvalue of E [A − I] is K(p − q) − p + nq with n multiplicity 1, the second eigenvalue is K(p−q)−p with multiplicity K −1, and the third eigenvalue n is −p with multiplicities (n − K ) [16]. [sent-153, score-0.442]

74 This table shows the lower-bound requirements on K and p − q that existing literature needs for exact recovery of the planted partitions/clusters. [sent-167, score-0.517]

75 Note that this table is under the assumption that every cluster is of size K, and the edge densities are uniformly p and q (for within and across clusters respectively). [sent-168, score-0.745]

76 1 q (a) (b) Figure 1: (a) Comparison of our method with Single-Linkage clustering (SLINK), spectral clustering, and low-rank-plus-sparse (L+S) approach. [sent-200, score-0.397]

77 The experiments are conducted on synthetic data with n = 1000 nodes and r = 5 clusters with equal size K = 200. [sent-203, score-0.346]

78 We generate a graph using the planted partition model with n = 1000 nodes, r = 5 clusters with equal size K = 200, and p, q ∈ [0, 1]. [sent-205, score-0.96]

79 For comparison, we consider three alternative methods: (1) Single-Linkage clustering (SLINK) [24], which is a hierarchical clustering method that merge the most similar clusters in each iteration. [sent-212, score-0.86]

80 We use the difference of neighbours, namely Ai· − Aj· 1 , as the distance measure of node i and j, and output when SLINK finds a clustering with r = 5 clusters. [sent-213, score-0.308]

81 (2) A spectral clustering method [26], where we run SLINK on the top r = 5 singular vectors of A. [sent-214, score-0.397]

82 (3) Low-rank-plus-sparse approach [17, 21], followed by the same rounding scheme. [sent-215, score-0.109]

83 Figure 1(b) shows more detailed results for sparse graphs (p ≤ 0. [sent-221, score-0.154]

84 1), for which SLINK and trace-norm-plus unweighted 1 completely fail, while our method significantly outperforms the spectral method, the only alternative method that works in this regime. [sent-223, score-0.205]

85 While at a high level these two steps have been employed in several papers on sparse and low-rank matrix decomposition, our analysis is different because it relies critically on the specific clustering setting we are in. [sent-228, score-0.433]

86 Thus, even though we are looking at a potentially more involved setting with input-dependent weights on the sparse matrix regularizer, our proof is much simpler than several others in this space. [sent-229, score-0.156]

87 Due to the constraints (3) in our convex program, if (Y ∗ + ∆, S ∗ − ∆) is a feasible solution to the convex program (2), then it has to be that ∆ ∈ D, where D {M ∈ Rn×n | ∀(i, j) ∈ R : −1 ≤ mij ≤ 0; ∀(i, j) ∈ Rc : 1 ≥ mij ≥ 0}. [sent-233, score-0.258]

88 Finally, we define the (now standard) projection operators: PΩ (M ) is the matrix where the (i, j)th entry is mij if (i, j) ∈ Ω, and 0 else. [sent-235, score-0.109]

89 If there exists a matrix W ∈ Rn×n and a positive number obeying the following conditions 1. [sent-239, score-0.092]

90 PΩc (U0 U0 + W ), ∆ ≥ −(1 − ) PΩc (C ◦ ∆) 1 , ∀∆ ∈ D then (Y ∗ , S ∗ ) is the unique optimal solution to the convex program (2). [sent-245, score-0.098]

91 The proof is in the supplementary material; it also involves several steps unique to our clustering setup here. [sent-246, score-0.368]

92 It is straightforward to adapt the proof to the general case with non-uniform ¯ edge probabilities. [sent-252, score-0.172]

93 K p 7 6 Conclusion We presented a convex optimization formulation, essentially a weighted version of low-rank matrix decomposition, to address graph clustering where the graph is sparse. [sent-261, score-0.736]

94 , succeeds under less restrictive conditions, every existing method in this setting. [sent-265, score-0.128]

95 This work is motivated by analyzing large-scale social network, where inherently, even actors (nodes) within one cluster are more than likely not having connections. [sent-267, score-0.298]

96 , rounding) when the obtained solution is not an exact cluster matrix. [sent-270, score-0.256]

97 Max cut for random graphs with a planted partition. [sent-298, score-0.542]

98 Algorithms for graph partitioning on the planted partition model. [sent-342, score-0.749]

99 Bounding the misclassification error in spectral partitioning in the planted partition model. [sent-372, score-0.7]

100 Slink: an optimally efficient algorithm for the single-link cluster method. [sent-427, score-0.256]


similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('planted', 0.458), ('clustering', 0.308), ('cluster', 0.256), ('clusters', 0.244), ('slink', 0.231), ('cac', 0.229), ('edge', 0.141), ('graph', 0.138), ('partition', 0.12), ('aij', 0.12), ('unweighted', 0.116), ('sanghavi', 0.11), ('rounding', 0.109), ('yij', 0.109), ('nodes', 0.102), ('giesen', 0.102), ('cate', 0.094), ('pij', 0.09), ('spectral', 0.089), ('certi', 0.088), ('oymak', 0.086), ('ca', 0.086), ('graphs', 0.084), ('combinatorial', 0.084), ('edges', 0.08), ('errors', 0.079), ('singapore', 0.077), ('sparse', 0.07), ('correlation', 0.067), ('pt', 0.065), ('austin', 0.064), ('relaxation', 0.061), ('sparser', 0.058), ('cij', 0.058), ('succeeds', 0.058), ('carson', 0.058), ('condon', 0.058), ('demaine', 0.058), ('hassibi', 0.058), ('mitsche', 0.058), ('stipulations', 0.058), ('adjacency', 0.055), ('matrix', 0.055), ('industrial', 0.054), ('mij', 0.054), ('symposium', 0.053), ('convex', 0.052), ('optimum', 0.049), ('shamir', 0.049), ('sparsity', 0.047), ('arxiv', 0.046), ('program', 0.046), ('weighted', 0.045), ('neighbours', 0.044), ('emanuel', 0.044), ('thumb', 0.044), ('insight', 0.043), ('social', 0.042), ('disjoint', 0.042), ('jerrum', 0.042), ('governing', 0.042), ('nq', 0.042), ('theoretic', 0.042), ('defer', 0.04), ('feige', 0.04), ('analytical', 0.04), ('guarantees', 0.039), ('proposition', 0.038), ('every', 0.038), ('conditions', 0.037), ('multiplicity', 0.036), ('rc', 0.036), ('recover', 0.036), ('type', 0.035), ('density', 0.035), ('kinds', 0.035), ('preprint', 0.035), ('classical', 0.035), ('densities', 0.034), ('semide', 0.034), ('block', 0.033), ('partitioning', 0.033), ('annual', 0.033), ('across', 0.032), ('log', 0.032), ('penalized', 0.032), ('dual', 0.032), ('existing', 0.032), ('succeed', 0.032), ('proof', 0.031), ('tx', 0.03), ('misclassi', 0.029), ('decomposition', 0.029), ('supplementary', 0.029), ('optimality', 0.029), ('mathematics', 0.028), ('nuclear', 0.028), ('eigenvalue', 0.028), ('recovery', 0.027), ('objective', 0.027)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 1.0000002 69 nips-2012-Clustering Sparse Graphs

Author: Yudong Chen, Sujay Sanghavi, Huan Xu

Abstract: We develop a new algorithm to cluster sparse unweighted graphs – i.e. partition the nodes into disjoint clusters so that there is higher density within clusters, and low across clusters. By sparsity we mean the setting where both the in-cluster and across cluster edge densities are very small, possibly vanishing in the size of the graph. Sparsity makes the problem noisier, and hence more difficult to solve. Any clustering involves a tradeoff between minimizing two kinds of errors: missing edges within clusters and present edges across clusters. Our insight is that in the sparse case, these must be penalized differently. We analyze our algorithm’s performance on the natural, classical and widely studied “planted partition” model (also called the stochastic block model); we show that our algorithm can cluster sparser graphs, and with smaller clusters, than all previous methods. This is seen empirically as well. 1

2 0.2630178 68 nips-2012-Clustering Aggregation as Maximum-Weight Independent Set

Author: Nan Li, Longin J. Latecki

Abstract: We formulate clustering aggregation as a special instance of Maximum-Weight Independent Set (MWIS) problem. For a given dataset, an attributed graph is constructed from the union of the input clusterings generated by different underlying clustering algorithms with different parameters. The vertices, which represent the distinct clusters, are weighted by an internal index measuring both cohesion and separation. The edges connect the vertices whose corresponding clusters overlap. Intuitively, an optimal aggregated clustering can be obtained by selecting an optimal subset of non-overlapping clusters partitioning the dataset together. We formalize this intuition as the MWIS problem on the attributed graph, i.e., finding the heaviest subset of mutually non-adjacent vertices. This MWIS problem exhibits a special structure. Since the clusters of each input clustering form a partition of the dataset, the vertices corresponding to each clustering form a maximal independent set (MIS) in the attributed graph. We propose a variant of simulated annealing method that takes advantage of this special structure. Our algorithm starts from each MIS, which is close to a distinct local optimum of the MWIS problem, and utilizes a local search heuristic to explore its neighborhood in order to find the MWIS. Extensive experiments on many challenging datasets show that: 1. our approach to clustering aggregation automatically decides the optimal number of clusters; 2. it does not require any parameter tuning for the underlying clustering algorithms; 3. it can combine the advantages of different underlying clustering algorithms to achieve superior performance; 4. it is robust against moderate or even bad input clusterings. 1

3 0.25141692 337 nips-2012-The Lovász ϑ function, SVMs and finding large dense subgraphs

Author: Vinay Jethava, Anders Martinsson, Chiranjib Bhattacharyya, Devdatt Dubhashi

Abstract: The Lov´ sz ϑ function of a graph, a fundamental tool in combinatorial optimizaa tion and approximation algorithms, is computed by solving a SDP. In this paper we establish that the Lov´ sz ϑ function is equivalent to a kernel learning problem a related to one class SVM. This interesting connection opens up many opportunities bridging graph theoretic algorithms and machine learning. We show that there exist graphs, which we call SVM − ϑ graphs, on which the Lov´ sz ϑ function a can be approximated well by a one-class SVM. This leads to novel use of SVM techniques for solving algorithmic problems in large graphs e.g. identifying a √ 1 planted clique of size Θ( n) in a random graph G(n, 2 ). A classic approach for this problem involves computing the ϑ function, however it is not scalable due to SDP computation. We show that the random graph with a planted clique is an example of SVM − ϑ graph. As a consequence a SVM based approach easily identifies the clique in large graphs and is competitive with the state-of-the-art. We introduce the notion of common orthogonal labelling and show that it can be computed by solving a Multiple Kernel learning problem. It is further shown that such a labelling is extremely useful in identifying a large common dense subgraph in multiple graphs, which is known to be a computationally difficult problem. The proposed algorithm achieves an order of magnitude scalability compared to state of the art methods. 1

4 0.19045928 70 nips-2012-Clustering by Nonnegative Matrix Factorization Using Graph Random Walk

Author: Zhirong Yang, Tele Hao, Onur Dikmen, Xi Chen, Erkki Oja

Abstract: Nonnegative Matrix Factorization (NMF) is a promising relaxation technique for clustering analysis. However, conventional NMF methods that directly approximate the pairwise similarities using the least square error often yield mediocre performance for data in curved manifolds because they can capture only the immediate similarities between data samples. Here we propose a new NMF clustering method which replaces the approximated matrix with its smoothed version using random walk. Our method can thus accommodate farther relationships between data samples. Furthermore, we introduce a novel regularization in the proposed objective function in order to improve over spectral clustering. The new learning objective is optimized by a multiplicative Majorization-Minimization algorithm with a scalable implementation for learning the factorizing matrix. Extensive experimental results on real-world datasets show that our method has strong performance in terms of cluster purity. 1

5 0.18525936 126 nips-2012-FastEx: Hash Clustering with Exponential Families

Author: Amr Ahmed, Sujith Ravi, Alex J. Smola, Shravan M. Narayanamurthy

Abstract: Clustering is a key component in any data analysis toolbox. Despite its importance, scalable algorithms often eschew rich statistical models in favor of simpler descriptions such as k-means clustering. In this paper we present a sampler, capable of estimating mixtures of exponential families. At its heart lies a novel proposal distribution using random projections to achieve high throughput in generating proposals, which is crucial for clustering models with large numbers of clusters. 1

6 0.1754355 316 nips-2012-Small-Variance Asymptotics for Exponential Family Dirichlet Process Mixture Models

7 0.16658595 7 nips-2012-A Divide-and-Conquer Method for Sparse Inverse Covariance Estimation

8 0.15689395 99 nips-2012-Dip-means: an incremental clustering method for estimating the number of clusters

9 0.14590345 307 nips-2012-Semi-Crowdsourced Clustering: Generalizing Crowd Labeling by Robust Distance Metric Learning

10 0.12388299 290 nips-2012-Recovery of Sparse Probability Measures via Convex Programming

11 0.11921919 172 nips-2012-Latent Graphical Model Selection: Efficient Methods for Locally Tree-like Graphs

12 0.11434478 326 nips-2012-Structure estimation for discrete graphical models: Generalized covariance matrices and their inverses

13 0.11203337 199 nips-2012-Link Prediction in Graphs with Autoregressive Features

14 0.10827419 277 nips-2012-Probabilistic Low-Rank Subspace Clustering

15 0.10547903 26 nips-2012-A nonparametric variable clustering model

16 0.10436205 342 nips-2012-The variational hierarchical EM algorithm for clustering hidden Markov models

17 0.1024924 64 nips-2012-Calibrated Elastic Regularization in Matrix Completion

18 0.098369464 251 nips-2012-On Lifting the Gibbs Sampling Algorithm

19 0.09485428 165 nips-2012-Iterative ranking from pair-wise comparisons

20 0.094359003 6 nips-2012-A Convex Formulation for Learning Scale-Free Networks via Submodular Relaxation


similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.236), (1, 0.101), (2, 0.069), (3, -0.147), (4, -0.053), (5, -0.044), (6, -0.057), (7, -0.153), (8, -0.079), (9, 0.139), (10, 0.16), (11, -0.244), (12, -0.131), (13, -0.039), (14, 0.174), (15, -0.048), (16, -0.11), (17, 0.077), (18, -0.125), (19, -0.109), (20, -0.204), (21, -0.092), (22, 0.004), (23, 0.042), (24, -0.037), (25, 0.041), (26, -0.028), (27, 0.02), (28, 0.024), (29, -0.079), (30, 0.011), (31, 0.009), (32, -0.032), (33, 0.051), (34, -0.03), (35, -0.058), (36, 0.023), (37, 0.045), (38, 0.031), (39, -0.01), (40, -0.008), (41, 0.048), (42, 0.054), (43, 0.08), (44, -0.022), (45, -0.031), (46, -0.041), (47, -0.001), (48, 0.059), (49, 0.016)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.97009951 69 nips-2012-Clustering Sparse Graphs

Author: Yudong Chen, Sujay Sanghavi, Huan Xu

Abstract: We develop a new algorithm to cluster sparse unweighted graphs – i.e. partition the nodes into disjoint clusters so that there is higher density within clusters, and low across clusters. By sparsity we mean the setting where both the in-cluster and across cluster edge densities are very small, possibly vanishing in the size of the graph. Sparsity makes the problem noisier, and hence more difficult to solve. Any clustering involves a tradeoff between minimizing two kinds of errors: missing edges within clusters and present edges across clusters. Our insight is that in the sparse case, these must be penalized differently. We analyze our algorithm’s performance on the natural, classical and widely studied “planted partition” model (also called the stochastic block model); we show that our algorithm can cluster sparser graphs, and with smaller clusters, than all previous methods. This is seen empirically as well. 1

2 0.87059969 70 nips-2012-Clustering by Nonnegative Matrix Factorization Using Graph Random Walk

Author: Zhirong Yang, Tele Hao, Onur Dikmen, Xi Chen, Erkki Oja

Abstract: Nonnegative Matrix Factorization (NMF) is a promising relaxation technique for clustering analysis. However, conventional NMF methods that directly approximate the pairwise similarities using the least square error often yield mediocre performance for data in curved manifolds because they can capture only the immediate similarities between data samples. Here we propose a new NMF clustering method which replaces the approximated matrix with its smoothed version using random walk. Our method can thus accommodate farther relationships between data samples. Furthermore, we introduce a novel regularization in the proposed objective function in order to improve over spectral clustering. The new learning objective is optimized by a multiplicative Majorization-Minimization algorithm with a scalable implementation for learning the factorizing matrix. Extensive experimental results on real-world datasets show that our method has strong performance in terms of cluster purity. 1

3 0.85109842 68 nips-2012-Clustering Aggregation as Maximum-Weight Independent Set

Author: Nan Li, Longin J. Latecki

Abstract: We formulate clustering aggregation as a special instance of Maximum-Weight Independent Set (MWIS) problem. For a given dataset, an attributed graph is constructed from the union of the input clusterings generated by different underlying clustering algorithms with different parameters. The vertices, which represent the distinct clusters, are weighted by an internal index measuring both cohesion and separation. The edges connect the vertices whose corresponding clusters overlap. Intuitively, an optimal aggregated clustering can be obtained by selecting an optimal subset of non-overlapping clusters partitioning the dataset together. We formalize this intuition as the MWIS problem on the attributed graph, i.e., finding the heaviest subset of mutually non-adjacent vertices. This MWIS problem exhibits a special structure. Since the clusters of each input clustering form a partition of the dataset, the vertices corresponding to each clustering form a maximal independent set (MIS) in the attributed graph. We propose a variant of simulated annealing method that takes advantage of this special structure. Our algorithm starts from each MIS, which is close to a distinct local optimum of the MWIS problem, and utilizes a local search heuristic to explore its neighborhood in order to find the MWIS. Extensive experiments on many challenging datasets show that: 1. our approach to clustering aggregation automatically decides the optimal number of clusters; 2. it does not require any parameter tuning for the underlying clustering algorithms; 3. it can combine the advantages of different underlying clustering algorithms to achieve superior performance; 4. it is robust against moderate or even bad input clusterings. 1

4 0.76962364 99 nips-2012-Dip-means: an incremental clustering method for estimating the number of clusters

Author: Argyris Kalogeratos, Aristidis Likas

Abstract: Learning the number of clusters is a key problem in data clustering. We present dip-means, a novel robust incremental method to learn the number of data clusters that can be used as a wrapper around any iterative clustering algorithm of k-means family. In contrast to many popular methods which make assumptions about the underlying cluster distributions, dip-means only assumes a fundamental cluster property: each cluster to admit a unimodal distribution. The proposed algorithm considers each cluster member as an individual ‘viewer’ and applies a univariate statistic hypothesis test for unimodality (dip-test) on the distribution of distances between the viewer and the cluster members. Important advantages are: i) the unimodality test is applied on univariate distance vectors, ii) it can be directly applied with kernel-based methods, since only the pairwise distances are involved in the computations. Experimental results on artificial and real datasets indicate the effectiveness of our method and its superiority over analogous approaches.

5 0.7080915 196 nips-2012-Learning with Partially Absorbing Random Walks

Author: Xiao-ming Wu, Zhenguo Li, Anthony M. So, John Wright, Shih-fu Chang

Abstract: We propose a novel stochastic process that is with probability αi being absorbed at current state i, and with probability 1 − αi follows a random edge out of it. We analyze its properties and show its potential for exploring graph structures. We prove that under proper absorption rates, a random walk starting from a set S of low conductance will be mostly absorbed in S. Moreover, the absorption probabilities vary slowly inside S, while dropping sharply outside, thus implementing the desirable cluster assumption for graph-based learning. Remarkably, the partially absorbing process unifies many popular models arising in a variety of contexts, provides new insights into them, and makes it possible for transferring findings from one paradigm to another. Simulation results demonstrate its promising applications in retrieval and classification.

6 0.6807217 133 nips-2012-Finding Exemplars from Pairwise Dissimilarities via Simultaneous Sparse Recovery

7 0.67999208 135 nips-2012-Forging The Graphs: A Low Rank and Positive Semidefinite Graph Learning Approach

8 0.64918756 26 nips-2012-A nonparametric variable clustering model

9 0.64088613 7 nips-2012-A Divide-and-Conquer Method for Sparse Inverse Covariance Estimation

10 0.5990873 337 nips-2012-The Lovász ϑ function, SVMs and finding large dense subgraphs

11 0.59467351 316 nips-2012-Small-Variance Asymptotics for Exponential Family Dirichlet Process Mixture Models

12 0.5829019 126 nips-2012-FastEx: Hash Clustering with Exponential Families

13 0.53571099 307 nips-2012-Semi-Crowdsourced Clustering: Generalizing Crowd Labeling by Robust Distance Metric Learning

14 0.53128552 251 nips-2012-On Lifting the Gibbs Sampling Algorithm

15 0.52354985 125 nips-2012-Factoring nonnegative matrices with linear programs

16 0.52104133 219 nips-2012-Modelling Reciprocating Relationships with Hawkes Processes

17 0.51713419 299 nips-2012-Scalable imputation of genetic data with a discrete fragmentation-coagulation process

18 0.49637499 155 nips-2012-Human memory search as a random walk in a semantic network

19 0.4911271 277 nips-2012-Probabilistic Low-Rank Subspace Clustering

20 0.48672134 342 nips-2012-The variational hierarchical EM algorithm for clustering hidden Markov models


similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(0, 0.093), (21, 0.019), (36, 0.216), (38, 0.164), (42, 0.045), (54, 0.021), (55, 0.024), (74, 0.073), (76, 0.131), (80, 0.072), (92, 0.058)]

similar papers list:

simIndex simValue paperId paperTitle

1 0.92470366 35 nips-2012-Adaptive Learning of Smoothing Functions: Application to Electricity Load Forecasting

Author: Amadou Ba, Mathieu Sinn, Yannig Goude, Pascal Pompey

Abstract: This paper proposes an efficient online learning algorithm to track the smoothing functions of Additive Models. The key idea is to combine the linear representation of Additive Models with a Recursive Least Squares (RLS) filter. In order to quickly track changes in the model and put more weight on recent data, the RLS filter uses a forgetting factor which exponentially weights down observations by the order of their arrival. The tracking behaviour is further enhanced by using an adaptive forgetting factor which is updated based on the gradient of the a priori errors. Using results from Lyapunov stability theory, upper bounds for the learning rate are analyzed. The proposed algorithm is applied to 5 years of electricity load data provided by the French utility company Electricit´ de France (EDF). e Compared to state-of-the-art methods, it achieves a superior performance in terms of model tracking and prediction accuracy. 1

2 0.8867268 295 nips-2012-Risk-Aversion in Multi-armed Bandits

Author: Amir Sani, Alessandro Lazaric, Rémi Munos

Abstract: Stochastic multi–armed bandits solve the Exploration–Exploitation dilemma and ultimately maximize the expected reward. Nonetheless, in many practical problems, maximizing the expected reward is not the most desirable objective. In this paper, we introduce a novel setting based on the principle of risk–aversion where the objective is to compete against the arm with the best risk–return trade–off. This setting proves to be more difficult than the standard multi-arm bandit setting due in part to an exploration risk which introduces a regret associated to the variability of an algorithm. Using variance as a measure of risk, we define two algorithms, investigate their theoretical guarantees, and report preliminary empirical results. 1

3 0.86255413 288 nips-2012-Rational inference of relative preferences

Author: Nisheeth Srivastava, Paul R. Schrater

Abstract: Statistical decision theory axiomatically assumes that the relative desirability of different options that humans perceive is well described by assigning them optionspecific scalar utility functions. However, this assumption is refuted by observed human behavior, including studies wherein preferences have been shown to change systematically simply through variation in the set of choice options presented. In this paper, we show that interpreting desirability as a relative comparison between available options at any particular decision instance results in a rational theory of value-inference that explains heretofore intractable violations of rational choice behavior in human subjects. Complementarily, we also characterize the conditions under which a rational agent selecting optimal options indicated by dynamic value inference in our framework will behave identically to one whose preferences are encoded using a static ordinal utility function. 1

same-paper 4 0.83296692 69 nips-2012-Clustering Sparse Graphs

Author: Yudong Chen, Sujay Sanghavi, Huan Xu

Abstract: We develop a new algorithm to cluster sparse unweighted graphs – i.e. partition the nodes into disjoint clusters so that there is higher density within clusters, and low across clusters. By sparsity we mean the setting where both the in-cluster and across cluster edge densities are very small, possibly vanishing in the size of the graph. Sparsity makes the problem noisier, and hence more difficult to solve. Any clustering involves a tradeoff between minimizing two kinds of errors: missing edges within clusters and present edges across clusters. Our insight is that in the sparse case, these must be penalized differently. We analyze our algorithm’s performance on the natural, classical and widely studied “planted partition” model (also called the stochastic block model); we show that our algorithm can cluster sparser graphs, and with smaller clusters, than all previous methods. This is seen empirically as well. 1

5 0.80737221 61 nips-2012-Best Arm Identification: A Unified Approach to Fixed Budget and Fixed Confidence

Author: Victor Gabillon, Mohammad Ghavamzadeh, Alessandro Lazaric

Abstract: We study the problem of identifying the best arm(s) in the stochastic multi-armed bandit setting. This problem has been studied in the literature from two different perspectives: fixed budget and fixed confidence. We propose a unifying approach that leads to a meta-algorithm called unified gap-based exploration (UGapE), with a common structure and similar theoretical analysis for these two settings. We prove a performance bound for the two versions of the algorithm showing that the two problems are characterized by the same notion of complexity. We also show how the UGapE algorithm as well as its theoretical analysis can be extended to take into account the variance of the arms and to multiple bandits. Finally, we evaluate the performance of UGapE and compare it with a number of existing fixed budget and fixed confidence algorithms. 1

6 0.79074132 326 nips-2012-Structure estimation for discrete graphical models: Generalized covariance matrices and their inverses

7 0.75512087 149 nips-2012-Hierarchical Optimistic Region Selection driven by Curiosity

8 0.752294 172 nips-2012-Latent Graphical Model Selection: Efficient Methods for Locally Tree-like Graphs

9 0.73425829 261 nips-2012-Online allocation and homogeneous partitioning for piecewise constant mean-approximation

10 0.73397809 316 nips-2012-Small-Variance Asymptotics for Exponential Family Dirichlet Process Mixture Models

11 0.73370463 260 nips-2012-Online Sum-Product Computation Over Trees

12 0.73301148 83 nips-2012-Controlled Recognition Bounds for Visual Learning and Exploration

13 0.73230886 19 nips-2012-A Spectral Algorithm for Latent Dirichlet Allocation

14 0.73191428 199 nips-2012-Link Prediction in Graphs with Autoregressive Features

15 0.73177516 192 nips-2012-Learning the Dependency Structure of Latent Factors

16 0.73096061 178 nips-2012-Learning Label Trees for Probabilistic Modelling of Implicit Feedback

17 0.73077387 6 nips-2012-A Convex Formulation for Learning Scale-Free Networks via Submodular Relaxation

18 0.72988039 342 nips-2012-The variational hierarchical EM algorithm for clustering hidden Markov models

19 0.72958463 235 nips-2012-Natural Images, Gaussian Mixtures and Dead Leaves

20 0.72860914 335 nips-2012-The Bethe Partition Function of Log-supermodular Graphical Models