nips nips2012 nips2012-68 knowledge-graph by maker-knowledge-mining

68 nips-2012-Clustering Aggregation as Maximum-Weight Independent Set


Source: pdf

Author: Nan Li, Longin J. Latecki

Abstract: We formulate clustering aggregation as a special instance of Maximum-Weight Independent Set (MWIS) problem. For a given dataset, an attributed graph is constructed from the union of the input clusterings generated by different underlying clustering algorithms with different parameters. The vertices, which represent the distinct clusters, are weighted by an internal index measuring both cohesion and separation. The edges connect the vertices whose corresponding clusters overlap. Intuitively, an optimal aggregated clustering can be obtained by selecting an optimal subset of non-overlapping clusters partitioning the dataset together. We formalize this intuition as the MWIS problem on the attributed graph, i.e., finding the heaviest subset of mutually non-adjacent vertices. This MWIS problem exhibits a special structure. Since the clusters of each input clustering form a partition of the dataset, the vertices corresponding to each clustering form a maximal independent set (MIS) in the attributed graph. We propose a variant of simulated annealing method that takes advantage of this special structure. Our algorithm starts from each MIS, which is close to a distinct local optimum of the MWIS problem, and utilizes a local search heuristic to explore its neighborhood in order to find the MWIS. Extensive experiments on many challenging datasets show that: 1. our approach to clustering aggregation automatically decides the optimal number of clusters; 2. it does not require any parameter tuning for the underlying clustering algorithms; 3. it can combine the advantages of different underlying clustering algorithms to achieve superior performance; 4. it is robust against moderate or even bad input clusterings. 1

Reference: text


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 edu Abstract We formulate clustering aggregation as a special instance of Maximum-Weight Independent Set (MWIS) problem. [sent-3, score-0.695]

2 For a given dataset, an attributed graph is constructed from the union of the input clusterings generated by different underlying clustering algorithms with different parameters. [sent-4, score-0.996]

3 The edges connect the vertices whose corresponding clusters overlap. [sent-6, score-0.366]

4 Intuitively, an optimal aggregated clustering can be obtained by selecting an optimal subset of non-overlapping clusters partitioning the dataset together. [sent-7, score-0.949]

5 Since the clusters of each input clustering form a partition of the dataset, the vertices corresponding to each clustering form a maximal independent set (MIS) in the attributed graph. [sent-12, score-1.474]

6 We propose a variant of simulated annealing method that takes advantage of this special structure. [sent-13, score-0.225]

7 our approach to clustering aggregation automatically decides the optimal number of clusters; 2. [sent-16, score-0.723]

8 it does not require any parameter tuning for the underlying clustering algorithms; 3. [sent-17, score-0.534]

9 it can combine the advantages of different underlying clustering algorithms to achieve superior performance; 4. [sent-18, score-0.634]

10 The goal is to partition the data objects into a set of groups (clusters) such that objects in the same group are similar, while objects in different groups are dissimilar. [sent-21, score-0.299]

11 In the past two decades, many different clustering algorithms have been developed. [sent-22, score-0.464]

12 However, there are potential shortcomings for each of the known clustering algorithms. [sent-24, score-0.441]

13 Furthermore, in order to achieve good performance, they require an appropriate number of clusters as the input parameter, which is usually very hard to specify. [sent-26, score-0.315]

14 DBSCAN [8], a density-based clustering algorithm, can detect clusters of arbitrary shapes and sizes. [sent-27, score-0.688]

15 1 Consensus clustering, also called clustering aggregation or clustering ensemble, refers to a kind of methods which try to find a single (consensus) superior clustering from a number of input clusterings obtained by different algorithms with different parameters. [sent-30, score-1.883]

16 The basic motivation of these methods is to combine the advantages of different clustering algorithms and overcome their respective shortcomings. [sent-31, score-0.53]

17 Besides generating stable and robust clusterings, consensus clustering methods can be applied in many other scenarios, such as categorical data clustering, ”privacy-preserving” clustering and so on. [sent-32, score-1.063]

18 [2] formulates clustering ensemble as a combinatorial optimization problem in terms of shared mutual information. [sent-34, score-0.484]

19 That is, the relationship between each pair of data objects is measured based on their cluster labels from the multiple input clusterings, rather than the original features. [sent-35, score-0.242]

20 Then a graph representation is constructed according to these relationships, and finding a single consolidated clustering is reduced to a graph partitioning problem. [sent-36, score-0.571]

21 Similarly, in [1], a number of deterministic approximation algorithms are proposed to find an ”aggregated” clustering which agrees as much as possible with the input clusterings. [sent-37, score-0.538]

22 [12] presents three iterative EM-like algorithms for the consensus clustering problem. [sent-41, score-0.645]

23 A common feature of these consensus clustering methods is that they usually do not access to the original features of the data objects. [sent-42, score-0.622]

24 They utilize the cluster labels in different input clusterings as the new features of each data object to find an optimal clustering. [sent-43, score-0.484]

25 Consequently, the success of these consensus clustering methods heavily relies on a premise that the majority of the input clusterings are reasonably good and consistent, which is not often the case in practice. [sent-44, score-0.96]

26 For example, given a new challenging dataset, it is probable that only some few of the chosen underlying clustering algorithms can generate good clusterings. [sent-45, score-0.554]

27 Many moderate or even bad input clustering can mislead the final ”consensus”. [sent-46, score-0.62]

28 Furthermore, even if we choose the appropriate underlying clustering algorithms, in order to obtain good input clusterings, we still have to specify the appropriate input parameters. [sent-47, score-0.679]

29 Therefore, it is desired to devise new consensus clustering methods which are more robust and do not need the optimal input parameters to be specified. [sent-48, score-0.755]

30 Informally, for each of the clusters in the input clusterings, we evaluate its quality with some internal indices measuring both the cohesion and separation. [sent-50, score-0.476]

31 In this framework, ideally, we can find the optimal ”aggregated clustering” even if only a minority of the input clusterings are good enough. [sent-54, score-0.365]

32 Therefore, we only need to specify an appropriate range of the input parameters, rather than the optimal values, for the underlying clustering algorithms. [sent-55, score-0.607]

33 An attributed graph is constructed from the union of the input clusterings. [sent-57, score-0.228]

34 The edges connect the vertices whose corresponding clusters overlap (In practice, we may tolerate a relatively small amount of overlap for robustness). [sent-59, score-0.451]

35 Then selecting an optimal subset of non-overlapping clusters partitioning the dataset together can be formulated as seeking the MWIS of the attributed graph, which is the heaviest subset of mutually non-adjacent vertices. [sent-60, score-0.602]

36 Since the clusters of each input clustering form a partition of the dataset, the vertices corresponding to each clustering form a maximal independent set (MIS) in the attributed graph. [sent-62, score-1.474]

37 The only difference is that our work applies the MWIS formulation to a more general problem, clustering aggregation. [sent-69, score-0.441]

38 As we mentioned before, in the context of clustering aggregation, the formulated 2 MWIS problem exhibits a special structure. [sent-72, score-0.554]

39 That is, the vertices corresponding to each clustering form a maximal independent set (MIS) in the attributed graph. [sent-73, score-0.703]

40 We propose a variant of simulated annealing method that takes advantage of this special structure. [sent-75, score-0.225]

41 Finally, since the selected clusters may not be able to cover the entire dataset, our approach performs a post-processing to assign the missing data objects to their nearest clusters. [sent-79, score-0.287]

42 our approach to clustering aggregation automatically decides the optimal number of clusters; 2. [sent-81, score-0.723]

43 it does not require any parameter tuning for the underlying clustering algorithms; 3. [sent-82, score-0.534]

44 it can combine the advantages of different underlying clustering algorithms to achieve superior performance; 4. [sent-83, score-0.634]

45 2, we present the formal statement of the clustering aggregation problem and its formulation as a special instance of MWIS problem. [sent-86, score-0.719]

46 A clustering Ci of D is obtained by applying an exclusive clustering algorithm with a specific set of input parameters on D. [sent-96, score-0.956]

47 With different clustering algorithms and different parameters, we can obtain a set of m different clusterings of D: C1 , C2 , . [sent-103, score-0.703]

48 For each cluster cij in the union of these m clusterings, we evaluate its quality with an internal index measuring both cohesion and separation. [sent-107, score-0.43]

49 We use the average silhouette coefficient of a cluster as such an internal index in this paper. [sent-108, score-0.289]

50 It is a measure of how similar that data object is to data objects in its own cluster compared to data objects in other clusters. [sent-110, score-0.286]

51 The quality of a particular cluster cij can be evaluated with the average of the silhouette coefficients of the data objects belonging to it. [sent-113, score-0.471]

52 t∈cij St (2) ASCcij = |cij | where St is the silhouette coefficient of the tth data object in cluster cij , |cij | is the cardinality of cluster cij . [sent-114, score-0.677]

53 We select an optimal subset of non-overlapping clusters from the union of all the clusterings, which partition the dataset together and have the best overall quality, as the ”aggregated clustering”. [sent-115, score-0.397]

54 The selection of clusters is formulated as a special instance of the Maximum-Weight Independent Set (MWIS) problem. [sent-116, score-0.321]

55 ∀i ∈ V : xi ∈ {0, 1}, xT Ax = 0 (3) The weight wi on vertex i is defined as: wi = ASCci × |ci | (4) where ci is the cluster represented by vertex i, ASCci and |ci | are its quality measure and cardinality respectively. [sent-127, score-0.421]

56 , Pm }, where Pi corresponds to the clustering Ci , such that each Pi is also a maximal independent set (MIS), which means it is not a subset of any other independent set. [sent-132, score-0.507]

57 This follows from the fact that each clustering Ci is a partition of the dataset D. [sent-133, score-0.527]

58 xt+1 , which is a neighbor of xt , is obtained by replacing some lower-weight vertices in xt with higher-weight vertices under the constraint of always being an independent set. [sent-140, score-0.695]

59 Specifically, we first reduce xt by removing a proportion q of lower-weight vertices. [sent-141, score-0.22]

60 Here we remove a proportion, rather than a fixed number, of vertices in order to make the reduction adaptive with respect to the number s of vertices in xt . [sent-142, score-0.486]

61 wi W Di = (6) j∈Ni wj where Ni is the set of vertices which are connected with vertex i in G. [sent-146, score-0.265]

62 Therefore, the obtained xt is likely to contain vertices with large weights and have large potential room for improvement. [sent-148, score-0.336]

63 Then our algorithm iteratively improves xt by adding compatible vertices one by one. [sent-150, score-0.37]

64 In each iteration, it first identifies all the vertices compatible with the existing ones in current xt , called candidates. [sent-151, score-0.37]

65 Then a ”local” measure W D is calculated to evaluate each of these candidates: wi W Di = (7) j∈N wj i where Ni is the set of candidate vertices which are connected with vertex i. [sent-152, score-0.314]

66 The candidate with the largest W D value is added to xt . [sent-154, score-0.235]

67 1 define a randomized ”moving” procedure of making a transition from xt to its t neighbor xt . [sent-173, score-0.395]

68 In these experiments, for the underlying clustering algorithms, including K-means, single linkage, complete linkage and Ward’s clustering, we use the implementations in MATLAB. [sent-187, score-0.642]

69 In the first experiment, we evaluate our approach’s ability to achieve good performance without specifying the optimal input parameters for the underlying clustering algorithms. [sent-203, score-0.659]

70 We choose K-means as the underlying clustering algorithm and vary the parameter K = 5 : 1 : 25, which is the desired number of clusters. [sent-207, score-0.538]

71 Since different runs of K-means starting from random initialization of centroids typically produce different clustering results, we run K-means 5 times for each value of K. [sent-208, score-0.441]

72 1, on each of the four subsets, the aggregated clustering obtained by our approach has the correct number (15) of clusters and near-perfect structure. [sent-216, score-0.761]

73 These results confirm that our approach can automatically decide the optimal number of clusters without any parameter tuning for the underlying clustering algorithms. [sent-218, score-0.803]

74 In the second experiment, we evaluate our approach’s ability of combining the advantages of different underlying clustering algorithms and canceling out the errors introduced by them. [sent-219, score-0.561]

75 Consequently, this dataset is very challenging for any single clustering algorithm. [sent-224, score-0.487]

76 In this experiment, we use four different underlying clustering algorithms implemented in MATLAB: single linkage, complete linkage, Ward’s clustering and K-means. [sent-225, score-0.97]

77 The only difference between them is that when merging pairs of clusters, single linkage is based on the minimum distance, while complete linkage is based on maximum distance. [sent-227, score-0.297]

78 The third one, Ward’s clustering algorithm, is also an agglomerative bottom-up algorithm. [sent-228, score-0.481]

79 In each merging step, it chooses the pair of clusters which minimize the sum of the square of distances from each point to the mean of the two clusters. [sent-229, score-0.241]

80 6 For each of the underlying clustering algorithms, we vary the input parameter of desired number of clusters as 4 : 1 : 10. [sent-231, score-0.828]

81 Note that, unlike [1], we do not use the average linkage clustering algorithm, because by specifying the correct number of clusters, it can generate near-perfect clustering by itself. [sent-233, score-1.045]

82 But, in practice, by utilizing good underlying clustering algorithms, it can significantly increase the chance for our approach to obtain superior aggregated clusterings. [sent-235, score-0.674]

83 2, we show the clustering results obtained by the four underlying clustering algorithms with the number of clusters set to be 7. [sent-240, score-1.186]

84 As we can see, our aggregated clustering is almost perfect, except for the three green data points in the ”bridge” between the cyan and green ”balls”. [sent-243, score-0.545]

85 These results confirm that our approach can effectively combine the advantages of different clustering algorithms and cancel out the errors introduced by them. [sent-244, score-0.53]

86 Also, in contrast to the other consensus clustering algorithms, such as [1], our aggregated clustering is obtained without specifying the optimal input parameters for any of the underlying clustering algorithm. [sent-245, score-1.801]

87 For our approach and all those consensus clustering algorithms, we choose K-means and Ward’s algorithm as the underlying clustering algorithms. [sent-257, score-1.128]

88 The multiple clusterings for each dataset are obtained by varying the desired number of clusters for both K-means and Ward’s algorithm. [sent-258, score-0.533]

89 Specif7 ically, for the test on 8D5K, we set the desired numbers of clusters as 3:1:7. [sent-259, score-0.248]

90 So there are also 10 different input clusterings for each of them. [sent-262, score-0.313]

91 3, the performance of our approach is better than those of the other consensus clustering algorithms. [sent-268, score-0.622]

92 The main reason is that, with a range of different input parameters, most clusterings generated by the underlying clustering algorithms are not good enough. [sent-269, score-0.867]

93 The ”consensus” based on these moderate or even bad input clusterings and much less good ones cannot be good. [sent-270, score-0.443]

94 In contrast, by selecting an optimal subset of the clusters, our approach can still achieve superior performance as long as there are good clusters in the input clusterings. [sent-271, score-0.419]

95 We formulate clustering aggregation as a MWIS problem with a special structure. [sent-274, score-0.672]

96 We propose a novel variant of simulated annealing method, which takes advantage of the special structure, for solving this special MWIS problem. [sent-276, score-0.271]

97 our approach to clustering aggregation automatically decides the optimal number of clusters; 2. [sent-278, score-0.723]

98 it does not require any parameter tuning for the underlying clustering algorithms; 3. [sent-279, score-0.534]

99 it can combine the advantages of different underlying clustering algorithms to achieve superior performance; 4. [sent-280, score-0.634]

100 IEEE Transactions on Information Theory 28 (2): 129-137 [8] Martin Ester, Hans-Peter Kriegel, Jorg Sander, Xiaowei Xu (1996) ”A density-based algorithm for discovering clusters in large spatial databases with noise”. [sent-306, score-0.216]


similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('mwis', 0.457), ('clustering', 0.441), ('mis', 0.25), ('clusterings', 0.239), ('clusters', 0.216), ('xt', 0.186), ('aggregation', 0.185), ('consensus', 0.181), ('silhouette', 0.154), ('vertices', 0.15), ('linkage', 0.136), ('cij', 0.117), ('ward', 0.111), ('aggregated', 0.104), ('annealing', 0.1), ('cluster', 0.097), ('pi', 0.088), ('cohesion', 0.087), ('attributed', 0.084), ('vertex', 0.083), ('input', 0.074), ('objects', 0.071), ('aij', 0.068), ('dbscan', 0.065), ('pendig', 0.065), ('underlying', 0.065), ('ci', 0.062), ('neighborhood', 0.058), ('moderate', 0.054), ('bad', 0.051), ('partitioning', 0.05), ('candidate', 0.049), ('tth', 0.048), ('iris', 0.047), ('object', 0.047), ('simulated', 0.047), ('dataset', 0.046), ('special', 0.046), ('decides', 0.044), ('ascci', 0.044), ('heaviest', 0.044), ('ensemble', 0.043), ('miss', 0.042), ('balls', 0.041), ('partition', 0.04), ('graph', 0.04), ('agglomerative', 0.04), ('superior', 0.039), ('distinct', 0.039), ('internal', 0.038), ('subset', 0.038), ('coef', 0.037), ('formulated', 0.036), ('jaccard', 0.035), ('heuristic', 0.035), ('radius', 0.034), ('proportion', 0.034), ('compatible', 0.034), ('combine', 0.034), ('tolerate', 0.033), ('advantages', 0.032), ('quality', 0.032), ('desired', 0.032), ('variant', 0.032), ('wi', 0.032), ('mining', 0.031), ('adjacency', 0.031), ('exhibits', 0.031), ('wt', 0.031), ('shapes', 0.031), ('union', 0.03), ('measuring', 0.029), ('bt', 0.029), ('icdm', 0.029), ('maximal', 0.028), ('ni', 0.028), ('tuning', 0.028), ('st', 0.027), ('segmentation', 0.027), ('specifying', 0.027), ('optimal', 0.027), ('automatically', 0.026), ('overlap', 0.026), ('matlab', 0.025), ('fth', 0.025), ('ict', 0.025), ('good', 0.025), ('pm', 0.025), ('merging', 0.025), ('formal', 0.024), ('exploration', 0.024), ('acceptance', 0.024), ('panels', 0.024), ('neighbor', 0.023), ('instance', 0.023), ('groups', 0.023), ('mutually', 0.023), ('accepted', 0.023), ('algorithms', 0.023), ('extensive', 0.022)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 1.0000005 68 nips-2012-Clustering Aggregation as Maximum-Weight Independent Set

Author: Nan Li, Longin J. Latecki

Abstract: We formulate clustering aggregation as a special instance of Maximum-Weight Independent Set (MWIS) problem. For a given dataset, an attributed graph is constructed from the union of the input clusterings generated by different underlying clustering algorithms with different parameters. The vertices, which represent the distinct clusters, are weighted by an internal index measuring both cohesion and separation. The edges connect the vertices whose corresponding clusters overlap. Intuitively, an optimal aggregated clustering can be obtained by selecting an optimal subset of non-overlapping clusters partitioning the dataset together. We formalize this intuition as the MWIS problem on the attributed graph, i.e., finding the heaviest subset of mutually non-adjacent vertices. This MWIS problem exhibits a special structure. Since the clusters of each input clustering form a partition of the dataset, the vertices corresponding to each clustering form a maximal independent set (MIS) in the attributed graph. We propose a variant of simulated annealing method that takes advantage of this special structure. Our algorithm starts from each MIS, which is close to a distinct local optimum of the MWIS problem, and utilizes a local search heuristic to explore its neighborhood in order to find the MWIS. Extensive experiments on many challenging datasets show that: 1. our approach to clustering aggregation automatically decides the optimal number of clusters; 2. it does not require any parameter tuning for the underlying clustering algorithms; 3. it can combine the advantages of different underlying clustering algorithms to achieve superior performance; 4. it is robust against moderate or even bad input clusterings. 1

2 0.2630178 69 nips-2012-Clustering Sparse Graphs

Author: Yudong Chen, Sujay Sanghavi, Huan Xu

Abstract: We develop a new algorithm to cluster sparse unweighted graphs – i.e. partition the nodes into disjoint clusters so that there is higher density within clusters, and low across clusters. By sparsity we mean the setting where both the in-cluster and across cluster edge densities are very small, possibly vanishing in the size of the graph. Sparsity makes the problem noisier, and hence more difficult to solve. Any clustering involves a tradeoff between minimizing two kinds of errors: missing edges within clusters and present edges across clusters. Our insight is that in the sparse case, these must be penalized differently. We analyze our algorithm’s performance on the natural, classical and widely studied “planted partition” model (also called the stochastic block model); we show that our algorithm can cluster sparser graphs, and with smaller clusters, than all previous methods. This is seen empirically as well. 1

3 0.17888363 70 nips-2012-Clustering by Nonnegative Matrix Factorization Using Graph Random Walk

Author: Zhirong Yang, Tele Hao, Onur Dikmen, Xi Chen, Erkki Oja

Abstract: Nonnegative Matrix Factorization (NMF) is a promising relaxation technique for clustering analysis. However, conventional NMF methods that directly approximate the pairwise similarities using the least square error often yield mediocre performance for data in curved manifolds because they can capture only the immediate similarities between data samples. Here we propose a new NMF clustering method which replaces the approximated matrix with its smoothed version using random walk. Our method can thus accommodate farther relationships between data samples. Furthermore, we introduce a novel regularization in the proposed objective function in order to improve over spectral clustering. The new learning objective is optimized by a multiplicative Majorization-Minimization algorithm with a scalable implementation for learning the factorizing matrix. Extensive experimental results on real-world datasets show that our method has strong performance in terms of cluster purity. 1

4 0.16126262 316 nips-2012-Small-Variance Asymptotics for Exponential Family Dirichlet Process Mixture Models

Author: Ke Jiang, Brian Kulis, Michael I. Jordan

Abstract: Sampling and variational inference techniques are two standard methods for inference in probabilistic models, but for many problems, neither approach scales effectively to large-scale data. An alternative is to relax the probabilistic model into a non-probabilistic formulation which has a scalable associated algorithm. This can often be fulfilled by performing small-variance asymptotics, i.e., letting the variance of particular distributions in the model go to zero. For instance, in the context of clustering, such an approach yields connections between the kmeans and EM algorithms. In this paper, we explore small-variance asymptotics for exponential family Dirichlet process (DP) and hierarchical Dirichlet process (HDP) mixture models. Utilizing connections between exponential family distributions and Bregman divergences, we derive novel clustering algorithms from the asymptotic limit of the DP and HDP mixtures that features the scalability of existing hard clustering methods as well as the flexibility of Bayesian nonparametric models. We focus on special cases of our analysis for discrete-data problems, including topic modeling, and we demonstrate the utility of our results by applying variants of our algorithms to problems arising in vision and document analysis. 1

5 0.16055125 126 nips-2012-FastEx: Hash Clustering with Exponential Families

Author: Amr Ahmed, Sujith Ravi, Alex J. Smola, Shravan M. Narayanamurthy

Abstract: Clustering is a key component in any data analysis toolbox. Despite its importance, scalable algorithms often eschew rich statistical models in favor of simpler descriptions such as k-means clustering. In this paper we present a sampler, capable of estimating mixtures of exponential families. At its heart lies a novel proposal distribution using random projections to achieve high throughput in generating proposals, which is crucial for clustering models with large numbers of clusters. 1

6 0.15525882 307 nips-2012-Semi-Crowdsourced Clustering: Generalizing Crowd Labeling by Robust Distance Metric Learning

7 0.13596085 7 nips-2012-A Divide-and-Conquer Method for Sparse Inverse Covariance Estimation

8 0.13585158 324 nips-2012-Stochastic Gradient Descent with Only One Projection

9 0.13019441 99 nips-2012-Dip-means: an incremental clustering method for estimating the number of clusters

10 0.12131404 121 nips-2012-Expectation Propagation in Gaussian Process Dynamical Systems

11 0.11296826 291 nips-2012-Reducing statistical time-series problems to binary classification

12 0.10629049 277 nips-2012-Probabilistic Low-Rank Subspace Clustering

13 0.10303317 26 nips-2012-A nonparametric variable clustering model

14 0.1022199 342 nips-2012-The variational hierarchical EM algorithm for clustering hidden Markov models

15 0.089479782 131 nips-2012-Feature Clustering for Accelerating Parallel Coordinate Descent

16 0.087470055 290 nips-2012-Recovery of Sparse Probability Measures via Convex Programming

17 0.079954252 13 nips-2012-A Nonparametric Conjugate Prior Distribution for the Maximizing Argument of a Noisy Function

18 0.074776314 251 nips-2012-On Lifting the Gibbs Sampling Algorithm

19 0.073665574 260 nips-2012-Online Sum-Product Computation Over Trees

20 0.072564788 135 nips-2012-Forging The Graphs: A Low Rank and Positive Semidefinite Graph Learning Approach


similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.192), (1, 0.06), (2, 0.02), (3, 0.004), (4, -0.004), (5, -0.12), (6, -0.034), (7, -0.128), (8, 0.016), (9, 0.073), (10, 0.135), (11, -0.28), (12, -0.091), (13, -0.048), (14, 0.162), (15, -0.023), (16, -0.155), (17, 0.101), (18, -0.126), (19, -0.065), (20, -0.203), (21, -0.132), (22, -0.003), (23, -0.015), (24, 0.001), (25, -0.011), (26, -0.011), (27, 0.072), (28, 0.063), (29, -0.045), (30, 0.022), (31, -0.065), (32, 0.033), (33, -0.044), (34, -0.063), (35, -0.043), (36, -0.05), (37, 0.062), (38, 0.015), (39, 0.005), (40, -0.047), (41, 0.025), (42, 0.023), (43, -0.033), (44, -0.022), (45, -0.056), (46, -0.044), (47, -0.061), (48, 0.019), (49, -0.022)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.98408639 68 nips-2012-Clustering Aggregation as Maximum-Weight Independent Set

Author: Nan Li, Longin J. Latecki

Abstract: We formulate clustering aggregation as a special instance of Maximum-Weight Independent Set (MWIS) problem. For a given dataset, an attributed graph is constructed from the union of the input clusterings generated by different underlying clustering algorithms with different parameters. The vertices, which represent the distinct clusters, are weighted by an internal index measuring both cohesion and separation. The edges connect the vertices whose corresponding clusters overlap. Intuitively, an optimal aggregated clustering can be obtained by selecting an optimal subset of non-overlapping clusters partitioning the dataset together. We formalize this intuition as the MWIS problem on the attributed graph, i.e., finding the heaviest subset of mutually non-adjacent vertices. This MWIS problem exhibits a special structure. Since the clusters of each input clustering form a partition of the dataset, the vertices corresponding to each clustering form a maximal independent set (MIS) in the attributed graph. We propose a variant of simulated annealing method that takes advantage of this special structure. Our algorithm starts from each MIS, which is close to a distinct local optimum of the MWIS problem, and utilizes a local search heuristic to explore its neighborhood in order to find the MWIS. Extensive experiments on many challenging datasets show that: 1. our approach to clustering aggregation automatically decides the optimal number of clusters; 2. it does not require any parameter tuning for the underlying clustering algorithms; 3. it can combine the advantages of different underlying clustering algorithms to achieve superior performance; 4. it is robust against moderate or even bad input clusterings. 1

2 0.82081211 69 nips-2012-Clustering Sparse Graphs

Author: Yudong Chen, Sujay Sanghavi, Huan Xu

Abstract: We develop a new algorithm to cluster sparse unweighted graphs – i.e. partition the nodes into disjoint clusters so that there is higher density within clusters, and low across clusters. By sparsity we mean the setting where both the in-cluster and across cluster edge densities are very small, possibly vanishing in the size of the graph. Sparsity makes the problem noisier, and hence more difficult to solve. Any clustering involves a tradeoff between minimizing two kinds of errors: missing edges within clusters and present edges across clusters. Our insight is that in the sparse case, these must be penalized differently. We analyze our algorithm’s performance on the natural, classical and widely studied “planted partition” model (also called the stochastic block model); we show that our algorithm can cluster sparser graphs, and with smaller clusters, than all previous methods. This is seen empirically as well. 1

3 0.80373186 70 nips-2012-Clustering by Nonnegative Matrix Factorization Using Graph Random Walk

Author: Zhirong Yang, Tele Hao, Onur Dikmen, Xi Chen, Erkki Oja

Abstract: Nonnegative Matrix Factorization (NMF) is a promising relaxation technique for clustering analysis. However, conventional NMF methods that directly approximate the pairwise similarities using the least square error often yield mediocre performance for data in curved manifolds because they can capture only the immediate similarities between data samples. Here we propose a new NMF clustering method which replaces the approximated matrix with its smoothed version using random walk. Our method can thus accommodate farther relationships between data samples. Furthermore, we introduce a novel regularization in the proposed objective function in order to improve over spectral clustering. The new learning objective is optimized by a multiplicative Majorization-Minimization algorithm with a scalable implementation for learning the factorizing matrix. Extensive experimental results on real-world datasets show that our method has strong performance in terms of cluster purity. 1

4 0.77682257 99 nips-2012-Dip-means: an incremental clustering method for estimating the number of clusters

Author: Argyris Kalogeratos, Aristidis Likas

Abstract: Learning the number of clusters is a key problem in data clustering. We present dip-means, a novel robust incremental method to learn the number of data clusters that can be used as a wrapper around any iterative clustering algorithm of k-means family. In contrast to many popular methods which make assumptions about the underlying cluster distributions, dip-means only assumes a fundamental cluster property: each cluster to admit a unimodal distribution. The proposed algorithm considers each cluster member as an individual ‘viewer’ and applies a univariate statistic hypothesis test for unimodality (dip-test) on the distribution of distances between the viewer and the cluster members. Important advantages are: i) the unimodality test is applied on univariate distance vectors, ii) it can be directly applied with kernel-based methods, since only the pairwise distances are involved in the computations. Experimental results on artificial and real datasets indicate the effectiveness of our method and its superiority over analogous approaches.

5 0.67505795 133 nips-2012-Finding Exemplars from Pairwise Dissimilarities via Simultaneous Sparse Recovery

Author: Ehsan Elhamifar, Guillermo Sapiro, René Vidal

Abstract: Given pairwise dissimilarities between data points, we consider the problem of finding a subset of data points, called representatives or exemplars, that can efficiently describe the data collection. We formulate the problem as a row-sparsity regularized trace minimization problem that can be solved efficiently using convex programming. The solution of the proposed optimization program finds the representatives and the probability that each data point is associated with each one of the representatives. We obtain the range of the regularization parameter for which the solution of the proposed optimization program changes from selecting one representative for all data points to selecting all data points as representatives. When data points are distributed around multiple clusters according to the dissimilarities, we show that the data points in each cluster select representatives only from that cluster. Unlike metric-based methods, our algorithm can be applied to dissimilarities that are asymmetric or violate the triangle inequality, i.e., it does not require that the pairwise dissimilarities come from a metric. We demonstrate the effectiveness of the proposed algorithm on synthetic data as well as real-world image and text data. 1

6 0.62941104 196 nips-2012-Learning with Partially Absorbing Random Walks

7 0.59959435 26 nips-2012-A nonparametric variable clustering model

8 0.57995015 7 nips-2012-A Divide-and-Conquer Method for Sparse Inverse Covariance Estimation

9 0.57183617 307 nips-2012-Semi-Crowdsourced Clustering: Generalizing Crowd Labeling by Robust Distance Metric Learning

10 0.56667006 299 nips-2012-Scalable imputation of genetic data with a discrete fragmentation-coagulation process

11 0.55772603 316 nips-2012-Small-Variance Asymptotics for Exponential Family Dirichlet Process Mixture Models

12 0.55311722 135 nips-2012-Forging The Graphs: A Low Rank and Positive Semidefinite Graph Learning Approach

13 0.53927815 291 nips-2012-Reducing statistical time-series problems to binary classification

14 0.53580612 126 nips-2012-FastEx: Hash Clustering with Exponential Families

15 0.50835645 155 nips-2012-Human memory search as a random walk in a semantic network

16 0.50394225 309 nips-2012-Semi-supervised Eigenvectors for Locally-biased Learning

17 0.48295411 342 nips-2012-The variational hierarchical EM algorithm for clustering hidden Markov models

18 0.45677462 251 nips-2012-On Lifting the Gibbs Sampling Algorithm

19 0.44493052 277 nips-2012-Probabilistic Low-Rank Subspace Clustering

20 0.43964761 157 nips-2012-Identification of Recurrent Patterns in the Activation of Brain Networks


similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(0, 0.076), (17, 0.02), (21, 0.038), (38, 0.121), (42, 0.057), (54, 0.025), (55, 0.014), (74, 0.079), (76, 0.15), (80, 0.073), (92, 0.04), (94, 0.226)]

similar papers list:

simIndex simValue paperId paperTitle

1 0.9643296 362 nips-2012-Waveform Driven Plasticity in BiFeO3 Memristive Devices: Model and Implementation

Author: Christian Mayr, Paul Stärke, Johannes Partzsch, Love Cederstroem, Rene Schüffny, Yao Shuai, Nan Du, Heidemarie Schmidt

Abstract: Memristive devices have recently been proposed as efficient implementations of plastic synapses in neuromorphic systems. The plasticity in these memristive devices, i.e. their resistance change, is defined by the applied waveforms. This behavior resembles biological synapses, whose plasticity is also triggered by mechanisms that are determined by local waveforms. However, learning in memristive devices has so far been approached mostly on a pragmatic technological level. The focus seems to be on finding any waveform that achieves spike-timing-dependent plasticity (STDP), without regard to the biological veracity of said waveforms or to further important forms of plasticity. Bridging this gap, we make use of a plasticity model driven by neuron waveforms that explains a large number of experimental observations and adapt it to the characteristics of the recently introduced BiFeO3 memristive material. Based on this approach, we show STDP for the first time for this material, with learning window replication superior to previous memristor-based STDP implementations. We also demonstrate in measurements that it is possible to overlay short and long term plasticity at a memristive device in the form of the well-known triplet plasticity. To the best of our knowledge, this is the first implementations of triplet plasticity on any physical memristive device. 1

2 0.85345876 276 nips-2012-Probabilistic Event Cascades for Alzheimer's disease

Author: Jonathan Huang, Daniel Alexander

Abstract: Accurate and detailed models of neurodegenerative disease progression are crucially important for reliable early diagnosis and the determination of effective treatments. We introduce the ALPACA (Alzheimer’s disease Probabilistic Cascades) model, a generative model linking latent Alzheimer’s progression dynamics to observable biomarker data. In contrast with previous works which model disease progression as a fixed event ordering, we explicitly model the variability over such orderings among patients which is more realistic, particularly for highly detailed progression models. We describe efficient learning algorithms for ALPACA and discuss promising experimental results on a real cohort of Alzheimer’s patients from the Alzheimer’s Disease Neuroimaging Initiative. 1

3 0.82209361 357 nips-2012-Unsupervised Template Learning for Fine-Grained Object Recognition

Author: Shulin Yang, Liefeng Bo, Jue Wang, Linda G. Shapiro

Abstract: Fine-grained recognition refers to a subordinate level of recognition, such as recognizing different species of animals and plants. It differs from recognition of basic categories, such as humans, tables, and computers, in that there are global similarities in shape and structure shared cross different categories, and the differences are in the details of object parts. We suggest that the key to identifying the fine-grained differences lies in finding the right alignment of image regions that contain the same object parts. We propose a template model for the purpose, which captures common shape patterns of object parts, as well as the cooccurrence relation of the shape patterns. Once the image regions are aligned, extracted features are used for classification. Learning of the template model is efficient, and the recognition results we achieve significantly outperform the stateof-the-art algorithms. 1

same-paper 4 0.80532968 68 nips-2012-Clustering Aggregation as Maximum-Weight Independent Set

Author: Nan Li, Longin J. Latecki

Abstract: We formulate clustering aggregation as a special instance of Maximum-Weight Independent Set (MWIS) problem. For a given dataset, an attributed graph is constructed from the union of the input clusterings generated by different underlying clustering algorithms with different parameters. The vertices, which represent the distinct clusters, are weighted by an internal index measuring both cohesion and separation. The edges connect the vertices whose corresponding clusters overlap. Intuitively, an optimal aggregated clustering can be obtained by selecting an optimal subset of non-overlapping clusters partitioning the dataset together. We formalize this intuition as the MWIS problem on the attributed graph, i.e., finding the heaviest subset of mutually non-adjacent vertices. This MWIS problem exhibits a special structure. Since the clusters of each input clustering form a partition of the dataset, the vertices corresponding to each clustering form a maximal independent set (MIS) in the attributed graph. We propose a variant of simulated annealing method that takes advantage of this special structure. Our algorithm starts from each MIS, which is close to a distinct local optimum of the MWIS problem, and utilizes a local search heuristic to explore its neighborhood in order to find the MWIS. Extensive experiments on many challenging datasets show that: 1. our approach to clustering aggregation automatically decides the optimal number of clusters; 2. it does not require any parameter tuning for the underlying clustering algorithms; 3. it can combine the advantages of different underlying clustering algorithms to achieve superior performance; 4. it is robust against moderate or even bad input clusterings. 1

5 0.80367041 79 nips-2012-Compressive neural representation of sparse, high-dimensional probabilities

Author: Xaq Pitkow

Abstract: This paper shows how sparse, high-dimensional probability distributions could be represented by neurons with exponential compression. The representation is a novel application of compressive sensing to sparse probability distributions rather than to the usual sparse signals. The compressive measurements correspond to expected values of nonlinear functions of the probabilistically distributed variables. When these expected values are estimated by sampling, the quality of the compressed representation is limited only by the quality of sampling. Since the compression preserves the geometric structure of the space of sparse probability distributions, probabilistic computation can be performed in the compressed domain. Interestingly, functions satisfying the requirements of compressive sensing can be implemented as simple perceptrons. If we use perceptrons as a simple model of feedforward computation by neurons, these results show that the mean activity of a relatively small number of neurons can accurately represent a highdimensional joint distribution implicitly, even without accounting for any noise correlations. This comprises a novel hypothesis for how neurons could encode probabilities in the brain. 1

6 0.74259979 364 nips-2012-Weighted Likelihood Policy Search with Model Selection

7 0.71497118 120 nips-2012-Exact and Stable Recovery of Sequences of Signals with Sparse Increments via Differential 1-Minimization

8 0.71353263 363 nips-2012-Wavelet based multi-scale shape features on arbitrary surfaces for cortical thickness discrimination

9 0.70621741 316 nips-2012-Small-Variance Asymptotics for Exponential Family Dirichlet Process Mixture Models

10 0.70602864 78 nips-2012-Compressive Sensing MRI with Wavelet Tree Sparsity

11 0.70350909 63 nips-2012-CPRL -- An Extension of Compressive Sensing to the Phase Retrieval Problem

12 0.70210338 354 nips-2012-Truly Nonparametric Online Variational Inference for Hierarchical Dirichlet Processes

13 0.70184356 209 nips-2012-Max-Margin Structured Output Regression for Spatio-Temporal Action Localization

14 0.69982922 69 nips-2012-Clustering Sparse Graphs

15 0.69932348 342 nips-2012-The variational hierarchical EM algorithm for clustering hidden Markov models

16 0.69920361 294 nips-2012-Repulsive Mixtures

17 0.69877142 83 nips-2012-Controlled Recognition Bounds for Visual Learning and Exploration

18 0.69872677 355 nips-2012-Truncation-free Online Variational Inference for Bayesian Nonparametric Models

19 0.69871324 260 nips-2012-Online Sum-Product Computation Over Trees

20 0.69866049 210 nips-2012-Memorability of Image Regions