nips nips2012 nips2012-99 knowledge-graph by maker-knowledge-mining

99 nips-2012-Dip-means: an incremental clustering method for estimating the number of clusters


Source: pdf

Author: Argyris Kalogeratos, Aristidis Likas

Abstract: Learning the number of clusters is a key problem in data clustering. We present dip-means, a novel robust incremental method to learn the number of data clusters that can be used as a wrapper around any iterative clustering algorithm of k-means family. In contrast to many popular methods which make assumptions about the underlying cluster distributions, dip-means only assumes a fundamental cluster property: each cluster to admit a unimodal distribution. The proposed algorithm considers each cluster member as an individual ‘viewer’ and applies a univariate statistic hypothesis test for unimodality (dip-test) on the distribution of distances between the viewer and the cluster members. Important advantages are: i) the unimodality test is applied on univariate distance vectors, ii) it can be directly applied with kernel-based methods, since only the pairwise distances are involved in the computations. Experimental results on artificial and real datasets indicate the effectiveness of our method and its superiority over analogous approaches.

Reference: text


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 Dip-means: an incremental clustering method for estimating the number of clusters Argyris Kalogeratos Department of Computer Science University of Ioannina Ioannina, Greece 45110 akaloger@cs. [sent-1, score-0.279]

2 We present dip-means, a novel robust incremental method to learn the number of data clusters that can be used as a wrapper around any iterative clustering algorithm of k-means family. [sent-6, score-0.31]

3 In contrast to many popular methods which make assumptions about the underlying cluster distributions, dip-means only assumes a fundamental cluster property: each cluster to admit a unimodal distribution. [sent-7, score-0.991]

4 The proposed algorithm considers each cluster member as an individual ‘viewer’ and applies a univariate statistic hypothesis test for unimodality (dip-test) on the distribution of distances between the viewer and the cluster members. [sent-8, score-1.301]

5 Important advantages are: i) the unimodality test is applied on univariate distance vectors, ii) it can be directly applied with kernel-based methods, since only the pairwise distances are involved in the computations. [sent-9, score-0.404]

6 Most clustering methods consider the number of clusters k as a required input, and then they apply an optimization procedure to adjust the parameters of the assumed cluster model. [sent-14, score-0.539]

7 In a top-down (incremental) strategy they start with one cluster and proceed to splitting as long as a certain criterion is satisfied. [sent-18, score-0.396]

8 At each phase, they evaluate the clustering produced with a fixed k and they decide whether to increase the number of clusters as follows: Repeat until no changes occur in the model structure 1. [sent-19, score-0.269]

9 G-means [4] is another extension to k-means that uses a statistical test for the hypothesis that each cluster has been generated from Gaussian distribution. [sent-27, score-0.368]

10 Since statistical tests become weaker in high dimensions, the algorithm first projects the datapoints of a cluster on an axis of high variance and then applies Anderson-Darling statistic with a fixed significance level α. [sent-28, score-0.564]

11 Projected g-means (pg-means) [5] again assumes that the dataset has been generated from a Gaussian mixture, but it tests the overall model at once and not each cluster separately. [sent-30, score-0.377]

12 Other alternatives have also been proposed, such as gap statistic [7], self-tuning spectral clustering [8], data spectroscopic clustering [9], and stability-based model validation [10]-[12], however they are not closely related to the proposed method. [sent-36, score-0.258]

13 Firstly, we propose a statistical test for unimodality, called dip-dist, to be applied into a data subset in order to determine if it contains a single or multiple cluster structures. [sent-41, score-0.331]

14 Moreover, it can be easily extended in kernel space by using the kernel k-means [13] and modifying appropriately the cluster splitting procedure. [sent-46, score-0.48]

15 2 Dip-dist criterion for cluster structure evaluation In cluster analysis, the detection of multiple cluster structures in a dataset requires assumptions about what the clusters we seek look like. [sent-47, score-1.203]

16 Moreover, it is of great value for a method to be able to verify the assumed cluster hypothesis with well designed statistical hypothesis tests that are theoretically sound, in contrast to various alternative ad hoc criteria. [sent-52, score-0.432]

17 We propose the novel dip-dist criterion for evaluating the cluster structure of a dataset that is based on testing the empirical density distribution of the data for unimodality. [sent-53, score-0.468]

18 The unimodality assumption implies that the empirical density of an acceptable cluster should have a single mode; a region where the density becomes maximum, while non-increasing density is observed when moving away from the mode. [sent-54, score-0.682]

19 There are no other underlying assumptions about the shape of a cluster and the distribution that generated the empirically observed unimodal property. [sent-55, score-0.397]

20 Under this assumption, it is possible to identify clusters generated by various unimodal distributions, such as Gaussian, Student-t, etc. [sent-56, score-0.27]

21 A convenient issue is that unimodality can be verified using powerful statistical hypothesis tests (especially for one-dimensional data), such as Silverman’s method which uses fixed-width kernel density estimates [14] or the widely used Hartigan’s dip statistic [15]. [sent-58, score-1.014]

22 Thus, although the data may be of arbitrary dimensionality, it is important to apply unimodality tests on 2 one-dimensional data values. [sent-60, score-0.272]

23 To meet the above requirements we propose the dip-dist criterion for determining unimodality in a set of datapoints using only their pairwise distances (or similarities). [sent-62, score-0.565]

24 More specifically, if we consider an arbitrary datapoint as a viewer and form a vector whose components are the distances of the viewer from all the datapoints, then the distribution of the values in this distance vector could reveal information about the cluster structure. [sent-63, score-0.955]

25 In the case of two distinct clusters, the distribution of distances should exhibit two distinct modes, with each mode containing the distances to the datapoints of each cluster. [sent-65, score-0.36]

26 Consequently, a unimodality test on the distribution of the values of the distance vector would provide indication about the unimodality of the cluster structure. [sent-66, score-0.821]

27 Intuitively, viewers at the boundaries of the set are expected to form distance vectors whose density modes are more distinct in case of more than one clusters. [sent-68, score-0.394]

28 To tackle the viewer selection problem, we consider all the datapoints of the set as individual viewers and perform the unimodality test on the distance vector of each viewer. [sent-69, score-1.0]

29 If there exist viewers that reject unimodality (called split viewers), we conclude that the examined cluster includes multiple cluster structures. [sent-70, score-1.359]

30 For testing unimodality we use Hartigans’ dip test [15]. [sent-71, score-0.819]

31 Then the dip statistic of a distribution function F is given by: dip(F) = min ρ(F, G). [sent-75, score-0.591]

32 G∈U (1) In other words, the dip statistic computes the minimum among the maximum deviations observed between the cdf F and the cdfs from the class of unimodal distributions. [sent-76, score-0.72]

33 A nice property of dip is that, if Fn is a sample distribution of n observations from F, then limn→∞ dip(Fn )=dip(F). [sent-77, score-0.534]

34 In [15] it is argued that the class of uniform distributions U is the most appropriate for the null hypothesis, since its dip values are stochastically larger than other unimodal distributions, such as those having exponentially decreasing tails. [sent-78, score-0.688]

35 Given a vector of observations f ={ fi : fi ∈ R}n , then the algorithm for performing the dip test [15] i=1 1 is applied on the respective empirical cdf Fn (t)= n n I( fi ≤ t). [sent-79, score-0.685]

36 Fortunately, for a given Fn , the complexity of one dip computation is O(n) [15]. [sent-82, score-0.534]

37 The computation of the p-value for a unimodality test uses bootstrap samples and expresses the probability of dip(Fn ) being less than the dip value of a cdf U r n of n observations sampled from the U[0,1] Uniform distribution: r P = # [dip(Fn) ≤ dip(Un )] / b, r = 1, . [sent-83, score-0.826]

38 N Let a dataset X={xi : xi ∈ Rd }i=1 then, in the present context, the dip test can be applied on any subset c, e. [sent-88, score-0.622]

39 a data cluster, and more specifically on the ecdf Fn (xi ) (t)= 1 x j ∈c {Dist(xi , x j ) ≤ t} of n the distances between a reference viewer xi of c and the n members of the set. [sent-90, score-0.441]

40 We call the viewers that identify multimodality and vote for the set to split as split viewers. [sent-91, score-0.801]

41 , n, for datapoint viewers using the sorted matrix Dist. [sent-101, score-0.339]

42 2 using a significance level α and compute the percentage of viewers identifying multimodality. [sent-107, score-0.32]

43 3 split 71% max dip min dip best split viewer: p=0. [sent-109, score-1.512]

44 2 (b) strongest split viewer max dip min dip 0. [sent-139, score-1.562]

45 2 (e) strongest split viewer no split no split 0. [sent-172, score-0.938]

46 2 (j) density plot Figure 1: Application of dip-dist criterion on 2d synthetic data with two structures of 200 datapoints each. [sent-193, score-0.367]

47 (b), (c) The histograms of pairwise distances of the strongest and weakest split viewer. [sent-196, score-0.384]

48 (d) The two structures come closer; the split viewers are reduced, so does the dip value for the split viewer. [sent-197, score-1.331]

49 The histograms in each row indicate the result of the dip test. [sent-203, score-0.534]

50 As the structures come closer, the number of viewers that observe multimodality decreases. [sent-204, score-0.42]

51 The respective density map shows clearly two modes, evidence that justifies why the dip-dist criterion determines multimodality with 24% of the viewers suggesting the split. [sent-207, score-0.503]

52 More generally, if the percentage of split viewers is greater than a small threshold, e. [sent-208, score-0.542]

53 1%, we may decide that the cluster is multimodal. [sent-210, score-0.324]

54 For this purpose k-means is used where the cluster models are their centroids. [sent-213, score-0.297]

55 The second, and most important, decides whether a data subset contains multiple cluster structures using the dip-dist presented in Section 2. [sent-214, score-0.391]

56 Dip-means methodology takes as input the dataset X and two parameters for the dip-dist criterion: the significance level α and the percentage threshold vthd of cluster members that should be split viewers to decide for a division (Algorithm 1). [sent-216, score-1.083]

57 output: the sets of cluster members C={c j }k , the models M={m j }k with the centroid of each c j set. [sent-218, score-0.375]

58 j=1 j=1 let: score=unimodalityTest(c, α, vthd ) returns a score value for the cluster c, {C, M}=kmeans(X, k) the k-means clustering, {C, M}=kmeans(X, M) when initialized with model M, {mL , mR }=splitCluster(c) that splits a cluster c and returns two centers mL , mR . [sent-219, score-0.807]

59 k ← kinit {C, M} ← kmeans(X, k) do while changes in cluster number occur for j=1,. [sent-220, score-0.373]

60 In each iteration, all k clusters are examined for unimodality, the set of split viewers v j is found, and the respective cluster c j is characterized as split candidate if |v j |/n j ≥vthd . [sent-224, score-1.266]

61 In this case, a non-zero score value is assigned to each cluster being a split candidate, while zero score is assigned to clusters that do not have sufficient split viewers. [sent-225, score-0.993]

62 Various alternatives can be employed in order to compute a score for a split candidate based on the percentage of split viewers, or even the size of clusters. [sent-226, score-0.57]

63 In our implementation score j of a split candidate cluster c j is computed as the average value of the dip statistic of its split viewers:  |v | (xi ) 1  ), n jj ≥ vthd  x ∈v dip(F score j =  |v j | i j (3)  0 , otherwise. [sent-227, score-1.579]

64 In order to avoid the overestimation of the real number of clusters, only the candidate with maximum score is split in each iteration. [sent-228, score-0.297]

65 A cluster is split into two clusters using a 2-means local search approach starting from a pair of sufficiently diverse centroids mL , mR inside the cluster and concerning only the datapoints of that cluster. [sent-229, score-1.173]

66 We use a simple way to set up the initial centroids {mL , mR } ← {x, m−(x −m)}, where x a cluster member selected at random and m the cluster centroid. [sent-230, score-0.653]

67 A computationally more expensive alternative could be the deterministic principal direction divisive partitioning (PDDP) [16] that splits the cluster based on the principal component. [sent-233, score-0.346]

68 More specifically, kernel dip-means uses kernel k-means [13] as local search technique, which also implies that centroids cannot be computed in kernel space, thus each cluster is now described explicitly by the set of its members c j . [sent-239, score-0.579]

69 In this case, since the transformed data vectors φ(x) are not available, the cluster splitting procedure could be seeded by two arbitrary cluster members. [sent-240, score-0.655]

70 As discussed in Section 2, the distribution of pairwise distances between a reference viewer and the members of a cluster reveals information about the multimodality of data distribution in the original space. [sent-242, score-0.816]

71 This implies that a split of the cluster members based on their distance to a reference viewer constitutes a reasonable split in the original space, as well. [sent-243, score-1.134]

72 We consider as reference split viewer the cluster member with the maximum dip value. [sent-245, score-1.359]

73 Here, 2-means is seeded using two values located 5 dip−means: ke = 1 no split 0. [sent-246, score-0.511]

74 8 1 (a) Single structure generated by a Student-t distribution dip−means: ke = 1 no split 0. [sent-284, score-0.489]

75 4 (b) Single Uniform rectangle structure dip−means: ke = 8 0. [sent-314, score-0.302]

76 4 (c) Eight clusters of various density and shape kernel dip−means: ke = 2 0. [sent-344, score-0.551]

77 4 (e) Three Uniform ring structures Figure 2: Clustering results on 2d synthetic unimodal cluster structures with 200 datapoints each (the centroids are marked with ⊗). [sent-374, score-0.79]

78 (d), (e) Nonlinearly separable ring clusters (kernel-based clustering with an RBF kernel). [sent-378, score-0.279]

79 After convergence, the resulting twoway partition of the datapoints, derived by the partition of the corresponding similarity values to the selected reference split viewer, initializes a local search with kernel 2-means. [sent-380, score-0.321]

80 Exception 6 Table 1: Results for synthetic datasets with fixed k∗ =20 clusters with 200 datapoints in each cluster. [sent-384, score-0.371]

81 Case 1, d=4 Case 1, d=16 Case 1, d=32 Methods ke ARI VI ke ARI VI ke ARI VI dip-means 20. [sent-385, score-0.801]

82 9 Case 2, d=4 Case 2, d=16 Case 2, d=32 Methods ke ARI VI ke ARI VI ke ARI VI dip-means 20. [sent-457, score-0.801]

83 2 is the pg-means method that uses EM for local search and does not rely on cluster splitting to add a new cluster. [sent-529, score-0.336]

84 The parameters of the dip-dist criterion are set as α=0 for significance level of dip test and b=1000 for the number of bootstraps. [sent-532, score-0.628]

85 We consider as split candidates the clusters having at least vthd =1% split viewers. [sent-533, score-0.74]

86 In Figures 2(a), (b), we provide two indicative examples of single cluster structures. [sent-540, score-0.297]

87 To test the kernel dip-means extension, we created two 2d synthetic dataset containing two and three Uniform ring structures and we used an RBF kernel to construct the kernel matrix K. [sent-543, score-0.382]

88 As we may observe, dip-means estimates the true number of clusters and finds the optimal grouping of datapoints in both cases, whereas kernel k-means fails in the three ring case. [sent-546, score-0.418]

89 Two cases were considered: 1) Gaussian mixtures of varying eccentricity, and 2) datasets with various cluster structures, i. [sent-548, score-0.342]

90 As reported in Table 2, dip-means correctly discovers the number of clusters for the subsets of Pendigits, while providing a reasonable underestimation ke near the optimal for the full datasets PD10tr and PD10te . [sent-562, score-0.502]

91 Apart from the excessive overfitting of x-means and g-means, pg-means seems to concludes in overestimated ke . [sent-563, score-0.267]

92 204 e and g-means provide more reasonable ke estimations, but still overestimations. [sent-632, score-0.285]

93 5 Conclusions We have presented a novel approach for testing whether multiple cluster structures are present in a set of data objects (e. [sent-638, score-0.403]

94 The proposed dip-dist criterion checks for unimodality of the empirical data density distribution, thus it is much more general compared to alternatives that test for Gaussianity. [sent-641, score-0.396]

95 Dip-dist uses a statistical hypothesis test, namely Hartigans’ dip test, in order to verify unimodality. [sent-642, score-0.571]

96 If a data object of the set is considered as a viewer, then the dip test can be applied on the one-dimensional distance (or similarity) vector with components the distances between the viewer and the members of the same set. [sent-643, score-0.987]

97 By considering all the data objects of the set as individual viewers and by combining the respective results of the test, the presence of multiple cluster structures in the set can be determined. [sent-645, score-0.705]

98 We have also proposed a new incremental clustering algorithm called dip-means, that incorporates dip-dist criterion in order to decide for cluster splitting. [sent-646, score-0.511]

99 The procedure starts with one cluster, it iteratively splits the cluster indicated by dip-dist as more probable to contain multiple cluster structures, and terminates when no new cluster split is suggested. [sent-647, score-1.151]

100 A comparative study of several cluster number selection criteria. [sent-671, score-0.297]


similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('dip', 0.534), ('cluster', 0.297), ('viewers', 0.29), ('ke', 0.267), ('viewer', 0.248), ('ari', 0.229), ('unimodality', 0.229), ('split', 0.222), ('datapoints', 0.167), ('clusters', 0.152), ('vthd', 0.122), ('unimodal', 0.1), ('clustering', 0.09), ('distances', 0.081), ('vi', 0.078), ('mr', 0.077), ('kinit', 0.076), ('fn', 0.074), ('multimodality', 0.067), ('tl', 0.064), ('structures', 0.063), ('kernel', 0.062), ('ioannina', 0.061), ('criterion', 0.06), ('members', 0.058), ('tu', 0.057), ('statistic', 0.057), ('ml', 0.054), ('uniform', 0.054), ('density', 0.052), ('kmeans', 0.051), ('score', 0.05), ('datapoint', 0.049), ('cance', 0.043), ('tests', 0.043), ('splitting', 0.039), ('erent', 0.038), ('centroids', 0.038), ('reference', 0.037), ('hypothesis', 0.037), ('ring', 0.037), ('incremental', 0.037), ('dataset', 0.037), ('rectangle', 0.035), ('di', 0.035), ('test', 0.034), ('respective', 0.034), ('means', 0.033), ('distance', 0.032), ('su', 0.032), ('spherical', 0.032), ('mode', 0.031), ('pg', 0.031), ('decides', 0.031), ('bisecting', 0.031), ('ectiveness', 0.031), ('hamerly', 0.031), ('hartigan', 0.031), ('hartigans', 0.031), ('splitcluster', 0.031), ('unimodalitytest', 0.031), ('wrapper', 0.031), ('percentage', 0.03), ('divisive', 0.029), ('cdf', 0.029), ('weakest', 0.029), ('pairwise', 0.028), ('clusterings', 0.028), ('greece', 0.027), ('datasets', 0.027), ('decide', 0.027), ('synthetic', 0.025), ('candidate', 0.025), ('examined', 0.024), ('strongest', 0.024), ('dist', 0.023), ('elliptic', 0.023), ('testing', 0.022), ('pendigits', 0.022), ('seeded', 0.022), ('candidates', 0.022), ('centers', 0.021), ('objects', 0.021), ('member', 0.021), ('alternatives', 0.021), ('subsets', 0.02), ('centroid', 0.02), ('modes', 0.02), ('splits', 0.02), ('appropriately', 0.02), ('frequency', 0.019), ('bic', 0.019), ('fi', 0.018), ('gaussian', 0.018), ('reasonable', 0.018), ('indicated', 0.018), ('discovers', 0.018), ('various', 0.018), ('irvine', 0.018), ('xi', 0.017)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.99999928 99 nips-2012-Dip-means: an incremental clustering method for estimating the number of clusters

Author: Argyris Kalogeratos, Aristidis Likas

Abstract: Learning the number of clusters is a key problem in data clustering. We present dip-means, a novel robust incremental method to learn the number of data clusters that can be used as a wrapper around any iterative clustering algorithm of k-means family. In contrast to many popular methods which make assumptions about the underlying cluster distributions, dip-means only assumes a fundamental cluster property: each cluster to admit a unimodal distribution. The proposed algorithm considers each cluster member as an individual ‘viewer’ and applies a univariate statistic hypothesis test for unimodality (dip-test) on the distribution of distances between the viewer and the cluster members. Important advantages are: i) the unimodality test is applied on univariate distance vectors, ii) it can be directly applied with kernel-based methods, since only the pairwise distances are involved in the computations. Experimental results on artificial and real datasets indicate the effectiveness of our method and its superiority over analogous approaches.

2 0.15689395 69 nips-2012-Clustering Sparse Graphs

Author: Yudong Chen, Sujay Sanghavi, Huan Xu

Abstract: We develop a new algorithm to cluster sparse unweighted graphs – i.e. partition the nodes into disjoint clusters so that there is higher density within clusters, and low across clusters. By sparsity we mean the setting where both the in-cluster and across cluster edge densities are very small, possibly vanishing in the size of the graph. Sparsity makes the problem noisier, and hence more difficult to solve. Any clustering involves a tradeoff between minimizing two kinds of errors: missing edges within clusters and present edges across clusters. Our insight is that in the sparse case, these must be penalized differently. We analyze our algorithm’s performance on the natural, classical and widely studied “planted partition” model (also called the stochastic block model); we show that our algorithm can cluster sparser graphs, and with smaller clusters, than all previous methods. This is seen empirically as well. 1

3 0.13660081 316 nips-2012-Small-Variance Asymptotics for Exponential Family Dirichlet Process Mixture Models

Author: Ke Jiang, Brian Kulis, Michael I. Jordan

Abstract: Sampling and variational inference techniques are two standard methods for inference in probabilistic models, but for many problems, neither approach scales effectively to large-scale data. An alternative is to relax the probabilistic model into a non-probabilistic formulation which has a scalable associated algorithm. This can often be fulfilled by performing small-variance asymptotics, i.e., letting the variance of particular distributions in the model go to zero. For instance, in the context of clustering, such an approach yields connections between the kmeans and EM algorithms. In this paper, we explore small-variance asymptotics for exponential family Dirichlet process (DP) and hierarchical Dirichlet process (HDP) mixture models. Utilizing connections between exponential family distributions and Bregman divergences, we derive novel clustering algorithms from the asymptotic limit of the DP and HDP mixtures that features the scalability of existing hard clustering methods as well as the flexibility of Bayesian nonparametric models. We focus on special cases of our analysis for discrete-data problems, including topic modeling, and we demonstrate the utility of our results by applying variants of our algorithms to problems arising in vision and document analysis. 1

4 0.13284796 126 nips-2012-FastEx: Hash Clustering with Exponential Families

Author: Amr Ahmed, Sujith Ravi, Alex J. Smola, Shravan M. Narayanamurthy

Abstract: Clustering is a key component in any data analysis toolbox. Despite its importance, scalable algorithms often eschew rich statistical models in favor of simpler descriptions such as k-means clustering. In this paper we present a sampler, capable of estimating mixtures of exponential families. At its heart lies a novel proposal distribution using random projections to achieve high throughput in generating proposals, which is crucial for clustering models with large numbers of clusters. 1

5 0.13019441 68 nips-2012-Clustering Aggregation as Maximum-Weight Independent Set

Author: Nan Li, Longin J. Latecki

Abstract: We formulate clustering aggregation as a special instance of Maximum-Weight Independent Set (MWIS) problem. For a given dataset, an attributed graph is constructed from the union of the input clusterings generated by different underlying clustering algorithms with different parameters. The vertices, which represent the distinct clusters, are weighted by an internal index measuring both cohesion and separation. The edges connect the vertices whose corresponding clusters overlap. Intuitively, an optimal aggregated clustering can be obtained by selecting an optimal subset of non-overlapping clusters partitioning the dataset together. We formalize this intuition as the MWIS problem on the attributed graph, i.e., finding the heaviest subset of mutually non-adjacent vertices. This MWIS problem exhibits a special structure. Since the clusters of each input clustering form a partition of the dataset, the vertices corresponding to each clustering form a maximal independent set (MIS) in the attributed graph. We propose a variant of simulated annealing method that takes advantage of this special structure. Our algorithm starts from each MIS, which is close to a distinct local optimum of the MWIS problem, and utilizes a local search heuristic to explore its neighborhood in order to find the MWIS. Extensive experiments on many challenging datasets show that: 1. our approach to clustering aggregation automatically decides the optimal number of clusters; 2. it does not require any parameter tuning for the underlying clustering algorithms; 3. it can combine the advantages of different underlying clustering algorithms to achieve superior performance; 4. it is robust against moderate or even bad input clusterings. 1

6 0.10824078 307 nips-2012-Semi-Crowdsourced Clustering: Generalizing Crowd Labeling by Robust Distance Metric Learning

7 0.088653944 70 nips-2012-Clustering by Nonnegative Matrix Factorization Using Graph Random Walk

8 0.074358985 251 nips-2012-On Lifting the Gibbs Sampling Algorithm

9 0.073676959 342 nips-2012-The variational hierarchical EM algorithm for clustering hidden Markov models

10 0.067068681 26 nips-2012-A nonparametric variable clustering model

11 0.0664391 264 nips-2012-Optimal kernel choice for large-scale two-sample tests

12 0.065353841 299 nips-2012-Scalable imputation of genetic data with a discrete fragmentation-coagulation process

13 0.059800632 7 nips-2012-A Divide-and-Conquer Method for Sparse Inverse Covariance Estimation

14 0.058725461 81 nips-2012-Context-Sensitive Decision Forests for Object Detection

15 0.056771331 339 nips-2012-The Time-Marginalized Coalescent Prior for Hierarchical Clustering

16 0.055261556 291 nips-2012-Reducing statistical time-series problems to binary classification

17 0.054606371 269 nips-2012-Persistent Homology for Learning Densities with Bounded Support

18 0.054105122 277 nips-2012-Probabilistic Low-Rank Subspace Clustering

19 0.054083046 294 nips-2012-Repulsive Mixtures

20 0.052343622 78 nips-2012-Compressive Sensing MRI with Wavelet Tree Sparsity


similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.14), (1, 0.056), (2, -0.031), (3, -0.059), (4, -0.047), (5, -0.064), (6, -0.018), (7, -0.005), (8, 0.021), (9, 0.035), (10, 0.094), (11, -0.167), (12, -0.01), (13, -0.077), (14, 0.091), (15, -0.085), (16, -0.105), (17, 0.055), (18, -0.071), (19, -0.079), (20, -0.11), (21, -0.069), (22, 0.031), (23, 0.031), (24, -0.019), (25, 0.015), (26, 0.02), (27, 0.005), (28, 0.069), (29, 0.02), (30, -0.001), (31, -0.095), (32, 0.03), (33, -0.009), (34, -0.054), (35, 0.038), (36, -0.035), (37, 0.04), (38, 0.006), (39, -0.018), (40, 0.041), (41, -0.035), (42, 0.016), (43, 0.071), (44, 0.043), (45, -0.014), (46, -0.011), (47, 0.088), (48, 0.054), (49, -0.012)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.95946991 99 nips-2012-Dip-means: an incremental clustering method for estimating the number of clusters

Author: Argyris Kalogeratos, Aristidis Likas

Abstract: Learning the number of clusters is a key problem in data clustering. We present dip-means, a novel robust incremental method to learn the number of data clusters that can be used as a wrapper around any iterative clustering algorithm of k-means family. In contrast to many popular methods which make assumptions about the underlying cluster distributions, dip-means only assumes a fundamental cluster property: each cluster to admit a unimodal distribution. The proposed algorithm considers each cluster member as an individual ‘viewer’ and applies a univariate statistic hypothesis test for unimodality (dip-test) on the distribution of distances between the viewer and the cluster members. Important advantages are: i) the unimodality test is applied on univariate distance vectors, ii) it can be directly applied with kernel-based methods, since only the pairwise distances are involved in the computations. Experimental results on artificial and real datasets indicate the effectiveness of our method and its superiority over analogous approaches.

2 0.77604401 68 nips-2012-Clustering Aggregation as Maximum-Weight Independent Set

Author: Nan Li, Longin J. Latecki

Abstract: We formulate clustering aggregation as a special instance of Maximum-Weight Independent Set (MWIS) problem. For a given dataset, an attributed graph is constructed from the union of the input clusterings generated by different underlying clustering algorithms with different parameters. The vertices, which represent the distinct clusters, are weighted by an internal index measuring both cohesion and separation. The edges connect the vertices whose corresponding clusters overlap. Intuitively, an optimal aggregated clustering can be obtained by selecting an optimal subset of non-overlapping clusters partitioning the dataset together. We formalize this intuition as the MWIS problem on the attributed graph, i.e., finding the heaviest subset of mutually non-adjacent vertices. This MWIS problem exhibits a special structure. Since the clusters of each input clustering form a partition of the dataset, the vertices corresponding to each clustering form a maximal independent set (MIS) in the attributed graph. We propose a variant of simulated annealing method that takes advantage of this special structure. Our algorithm starts from each MIS, which is close to a distinct local optimum of the MWIS problem, and utilizes a local search heuristic to explore its neighborhood in order to find the MWIS. Extensive experiments on many challenging datasets show that: 1. our approach to clustering aggregation automatically decides the optimal number of clusters; 2. it does not require any parameter tuning for the underlying clustering algorithms; 3. it can combine the advantages of different underlying clustering algorithms to achieve superior performance; 4. it is robust against moderate or even bad input clusterings. 1

3 0.74501967 70 nips-2012-Clustering by Nonnegative Matrix Factorization Using Graph Random Walk

Author: Zhirong Yang, Tele Hao, Onur Dikmen, Xi Chen, Erkki Oja

Abstract: Nonnegative Matrix Factorization (NMF) is a promising relaxation technique for clustering analysis. However, conventional NMF methods that directly approximate the pairwise similarities using the least square error often yield mediocre performance for data in curved manifolds because they can capture only the immediate similarities between data samples. Here we propose a new NMF clustering method which replaces the approximated matrix with its smoothed version using random walk. Our method can thus accommodate farther relationships between data samples. Furthermore, we introduce a novel regularization in the proposed objective function in order to improve over spectral clustering. The new learning objective is optimized by a multiplicative Majorization-Minimization algorithm with a scalable implementation for learning the factorizing matrix. Extensive experimental results on real-world datasets show that our method has strong performance in terms of cluster purity. 1

4 0.72413069 69 nips-2012-Clustering Sparse Graphs

Author: Yudong Chen, Sujay Sanghavi, Huan Xu

Abstract: We develop a new algorithm to cluster sparse unweighted graphs – i.e. partition the nodes into disjoint clusters so that there is higher density within clusters, and low across clusters. By sparsity we mean the setting where both the in-cluster and across cluster edge densities are very small, possibly vanishing in the size of the graph. Sparsity makes the problem noisier, and hence more difficult to solve. Any clustering involves a tradeoff between minimizing two kinds of errors: missing edges within clusters and present edges across clusters. Our insight is that in the sparse case, these must be penalized differently. We analyze our algorithm’s performance on the natural, classical and widely studied “planted partition” model (also called the stochastic block model); we show that our algorithm can cluster sparser graphs, and with smaller clusters, than all previous methods. This is seen empirically as well. 1

5 0.69768602 316 nips-2012-Small-Variance Asymptotics for Exponential Family Dirichlet Process Mixture Models

Author: Ke Jiang, Brian Kulis, Michael I. Jordan

Abstract: Sampling and variational inference techniques are two standard methods for inference in probabilistic models, but for many problems, neither approach scales effectively to large-scale data. An alternative is to relax the probabilistic model into a non-probabilistic formulation which has a scalable associated algorithm. This can often be fulfilled by performing small-variance asymptotics, i.e., letting the variance of particular distributions in the model go to zero. For instance, in the context of clustering, such an approach yields connections between the kmeans and EM algorithms. In this paper, we explore small-variance asymptotics for exponential family Dirichlet process (DP) and hierarchical Dirichlet process (HDP) mixture models. Utilizing connections between exponential family distributions and Bregman divergences, we derive novel clustering algorithms from the asymptotic limit of the DP and HDP mixtures that features the scalability of existing hard clustering methods as well as the flexibility of Bayesian nonparametric models. We focus on special cases of our analysis for discrete-data problems, including topic modeling, and we demonstrate the utility of our results by applying variants of our algorithms to problems arising in vision and document analysis. 1

6 0.66354638 126 nips-2012-FastEx: Hash Clustering with Exponential Families

7 0.66069371 307 nips-2012-Semi-Crowdsourced Clustering: Generalizing Crowd Labeling by Robust Distance Metric Learning

8 0.6531868 26 nips-2012-A nonparametric variable clustering model

9 0.64612037 133 nips-2012-Finding Exemplars from Pairwise Dissimilarities via Simultaneous Sparse Recovery

10 0.55002898 251 nips-2012-On Lifting the Gibbs Sampling Algorithm

11 0.49768624 294 nips-2012-Repulsive Mixtures

12 0.49629325 342 nips-2012-The variational hierarchical EM algorithm for clustering hidden Markov models

13 0.49511355 135 nips-2012-Forging The Graphs: A Low Rank and Positive Semidefinite Graph Learning Approach

14 0.48917288 291 nips-2012-Reducing statistical time-series problems to binary classification

15 0.48895603 155 nips-2012-Human memory search as a random walk in a semantic network

16 0.47861135 196 nips-2012-Learning with Partially Absorbing Random Walks

17 0.45156249 299 nips-2012-Scalable imputation of genetic data with a discrete fragmentation-coagulation process

18 0.43408847 7 nips-2012-A Divide-and-Conquer Method for Sparse Inverse Covariance Estimation

19 0.41307405 277 nips-2012-Probabilistic Low-Rank Subspace Clustering

20 0.39557663 338 nips-2012-The Perturbed Variation


similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(0, 0.078), (9, 0.302), (21, 0.027), (38, 0.082), (39, 0.022), (42, 0.02), (54, 0.018), (55, 0.014), (74, 0.082), (76, 0.171), (80, 0.054), (92, 0.029)]

similar papers list:

simIndex simValue paperId paperTitle

1 0.81824392 299 nips-2012-Scalable imputation of genetic data with a discrete fragmentation-coagulation process

Author: Lloyd Elliott, Yee W. Teh

Abstract: We present a Bayesian nonparametric model for genetic sequence data in which a set of genetic sequences is modelled using a Markov model of partitions. The partitions at consecutive locations in the genome are related by the splitting and merging of their clusters. Our model can be thought of as a discrete analogue of the continuous fragmentation-coagulation process [Teh et al 2011], preserving the important properties of projectivity, exchangeability and reversibility, while being more scalable. We apply this model to the problem of genotype imputation, showing improved computational efficiency while maintaining accuracies comparable to other state-of-the-art genotype imputation methods. 1

same-paper 2 0.76356083 99 nips-2012-Dip-means: an incremental clustering method for estimating the number of clusters

Author: Argyris Kalogeratos, Aristidis Likas

Abstract: Learning the number of clusters is a key problem in data clustering. We present dip-means, a novel robust incremental method to learn the number of data clusters that can be used as a wrapper around any iterative clustering algorithm of k-means family. In contrast to many popular methods which make assumptions about the underlying cluster distributions, dip-means only assumes a fundamental cluster property: each cluster to admit a unimodal distribution. The proposed algorithm considers each cluster member as an individual ‘viewer’ and applies a univariate statistic hypothesis test for unimodality (dip-test) on the distribution of distances between the viewer and the cluster members. Important advantages are: i) the unimodality test is applied on univariate distance vectors, ii) it can be directly applied with kernel-based methods, since only the pairwise distances are involved in the computations. Experimental results on artificial and real datasets indicate the effectiveness of our method and its superiority over analogous approaches.

3 0.73187613 156 nips-2012-Identifiability and Unmixing of Latent Parse Trees

Author: Percy Liang, Daniel J. Hsu, Sham M. Kakade

Abstract: This paper explores unsupervised learning of parsing models along two directions. First, which models are identifiable from infinite data? We use a general technique for numerically checking identifiability based on the rank of a Jacobian matrix, and apply it to several standard constituency and dependency parsing models. Second, for identifiable models, how do we estimate the parameters efficiently? EM suffers from local optima, while recent work using spectral methods [1] cannot be directly applied since the topology of the parse tree varies across sentences. We develop a strategy, unmixing, which deals with this additional complexity for restricted classes of parsing models. 1

4 0.65572959 246 nips-2012-Nonparametric Max-Margin Matrix Factorization for Collaborative Prediction

Author: Minjie Xu, Jun Zhu, Bo Zhang

Abstract: We present a probabilistic formulation of max-margin matrix factorization and build accordingly a nonparametric Bayesian model which automatically resolves the unknown number of latent factors. Our work demonstrates a successful example that integrates Bayesian nonparametrics and max-margin learning, which are conventionally two separate paradigms and enjoy complementary advantages. We develop an efficient variational algorithm for posterior inference, and our extensive empirical studies on large-scale MovieLens and EachMovie data sets appear to justify the aforementioned dual advantages. 1

5 0.63164204 230 nips-2012-Multiple Choice Learning: Learning to Produce Multiple Structured Outputs

Author: Abner Guzmán-rivera, Dhruv Batra, Pushmeet Kohli

Abstract: We address the problem of generating multiple hypotheses for structured prediction tasks that involve interaction with users or successive components in a cascaded architecture. Given a set of multiple hypotheses, such components/users typically have the ability to retrieve the best (or approximately the best) solution in this set. The standard approach for handling such a scenario is to first learn a single-output model and then produce M -Best Maximum a Posteriori (MAP) hypotheses from this model. In contrast, we learn to produce multiple outputs by formulating this task as a multiple-output structured-output prediction problem with a loss-function that effectively captures the setup of the problem. We present a max-margin formulation that minimizes an upper-bound on this lossfunction. Experimental results on image segmentation and protein side-chain prediction show that our method outperforms conventional approaches used for this type of scenario and leads to substantial improvements in prediction accuracy. 1

6 0.61277246 75 nips-2012-Collaborative Ranking With 17 Parameters

7 0.58530504 210 nips-2012-Memorability of Image Regions

8 0.58233702 209 nips-2012-Max-Margin Structured Output Regression for Spatio-Temporal Action Localization

9 0.5793404 68 nips-2012-Clustering Aggregation as Maximum-Weight Independent Set

10 0.57921892 354 nips-2012-Truly Nonparametric Online Variational Inference for Hierarchical Dirichlet Processes

11 0.57876652 49 nips-2012-Automatic Feature Induction for Stagewise Collaborative Filtering

12 0.57684833 274 nips-2012-Priors for Diversity in Generative Latent Variable Models

13 0.57423484 233 nips-2012-Multiresolution Gaussian Processes

14 0.57209516 357 nips-2012-Unsupervised Template Learning for Fine-Grained Object Recognition

15 0.57200348 90 nips-2012-Deep Learning of Invariant Features via Simulated Fixations in Video

16 0.57043368 180 nips-2012-Learning Mixtures of Tree Graphical Models

17 0.5703336 294 nips-2012-Repulsive Mixtures

18 0.56997037 188 nips-2012-Learning from Distributions via Support Measure Machines

19 0.56922811 356 nips-2012-Unsupervised Structure Discovery for Semantic Analysis of Audio

20 0.56921935 307 nips-2012-Semi-Crowdsourced Clustering: Generalizing Crowd Labeling by Robust Distance Metric Learning