nips nips2007 nips2007-80 knowledge-graph by maker-knowledge-mining

80 nips-2007-Ensemble Clustering using Semidefinite Programming


Source: pdf

Author: Vikas Singh, Lopamudra Mukherjee, Jiming Peng, Jinhui Xu

Abstract: We consider the ensemble clustering problem where the task is to ‘aggregate’ multiple clustering solutions into a single consolidated clustering that maximizes the shared information among given clustering solutions. We obtain several new results for this problem. First, we note that the notion of agreement under such circumstances can be better captured using an agreement measure based on a 2D string encoding rather than voting strategy based methods proposed in literature. Using this generalization, we first derive a nonlinear optimization model to maximize the new agreement measure. We then show that our optimization problem can be transformed into a strict 0-1 Semidefinite Program (SDP) via novel convexification techniques which can subsequently be relaxed to a polynomial time solvable SDP. Our experiments indicate improvements not only in terms of the proposed agreement measure but also the existing agreement measures based on voting strategies. We discuss evaluations on clustering and image segmentation databases. 1

Reference: text


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 edu Abstract We consider the ensemble clustering problem where the task is to ‘aggregate’ multiple clustering solutions into a single consolidated clustering that maximizes the shared information among given clustering solutions. [sent-8, score-1.661]

2 First, we note that the notion of agreement under such circumstances can be better captured using an agreement measure based on a 2D string encoding rather than voting strategy based methods proposed in literature. [sent-10, score-0.295]

3 Using this generalization, we first derive a nonlinear optimization model to maximize the new agreement measure. [sent-11, score-0.091]

4 Our experiments indicate improvements not only in terms of the proposed agreement measure but also the existing agreement measures based on voting strategies. [sent-13, score-0.215]

5 We discuss evaluations on clustering and image segmentation databases. [sent-14, score-0.489]

6 1 Introduction In the so-called Ensemble Clustering problem, the target is to ‘combine’ multiple clustering solutions or partitions of a set into a single consolidated clustering that maximizes the information shared (or ‘agreement’) among all available clustering solutions. [sent-15, score-1.067]

7 The need for this form of clustering arises in many applications, especially real world scenarios with a high degree of uncertainty such as image segmentation with poor signal to noise ratio and computer assisted disease diagnosis. [sent-16, score-0.48]

8 It is quite common that a single clustering algorithm may not yield satisfactory results, while multiple algorithms may individually make imperfect choices, assigning some elements to wrong clusters. [sent-17, score-0.347]

9 Usually, by considering the results of several different clustering algorithms together, one may be able to mitigate degeneracies in individual solutions and consequently obtain better solutions. [sent-18, score-0.391]

10 , dn }, a set of clustering solutions C = {C1 , C2 , . [sent-23, score-0.387]

11 , Cm } obtained from m different clustering algorithms is called a cluster ensemble. [sent-26, score-0.556]

12 The Ensemble Clustering problem requires one to use the individual solutions in C to partition D into k clusters such that information shared (‘agreement’) among the solutions of C is maximized. [sent-28, score-0.3]

13 1 Previous works The Ensemble Clustering problem was recently introduced by Strehl and Ghosh [3], in [4] a related notion of correlation clustering was independently proposed by Bansal, Blum, and Chawla. [sent-30, score-0.305]

14 Formulations primarily differ in how the objective of shared information maximization or agreement is chosen, we review some of the popular techniques next. [sent-32, score-0.164]

15 The Instance Based Graph Formulation (IBGF) [2, 5] first constructs a fully connected graph G = (V, W ) for the ensemble C = (C1 , . [sent-33, score-0.331]

16 The edge weight wij for the pair (di , dj ) is defined as the number of algorithms in C that assign the nodes di and dj to the same cluster (i. [sent-40, score-0.34]

17 Then, standard graph partitioning techniques are used to obtain a final clustering solution. [sent-43, score-0.347]

18 In Cluster Based Graph Formulation (CBGF), a given cluster ensemble is represented as C = ¯ ¯ {C11 , . [sent-44, score-0.54]

19 , Cm·k } where Cij denotes the ith cluster of the jth algorithm in C. [sent-50, score-0.251]

20 Like IBGF, this approach also constructs a graph, G = (V, W ), to model the correspondence (or ‘similarity’) relationship among the mk clusters, where the similarity matrix W reflects the Jaccard’s ¯ ¯ similarity measure between clusters Ci and Cj . [sent-51, score-0.25]

21 The graph is then partitioned so that the clusters of the same group are similar to one another. [sent-52, score-0.188]

22 The connection of this example to the ensemble clustering problem is clear. [sent-67, score-0.594]

23 Existing algorithms represent the similarity between elements in D as a scalar value assigned to the edge joining their corresponding nodes in the graph. [sent-68, score-0.148]

24 This weight is essentially a ‘vote’ reflecting the number of algorithms that assigned those two elements to the same cluster. [sent-69, score-0.088]

25 When expanding the group to include more elements, one is not sure if a common feature exists under which the larger group is similar. [sent-72, score-0.092]

26 This can be seen by looking at clusters as items and the Jaccard’s similarity measure as the vote (weight) on the edges. [sent-75, score-0.204]

27 3 Main Ideas To model the intuition above, we generalize the similarity metric to maximize similarity or ‘agreement’ by an appropriate encoding of the solutions obtained from individual clustering algorithms. [sent-76, score-0.54]

28 The ensemble clustering problem thus reduces to a form of string clustering problem where our objective is to assign similar strings to the same cluster. [sent-78, score-1.054]

29 Let m be the number of clustering algorithms with each solution having no more than k clusters. [sent-81, score-0.338]

30 For every data element dl ∈ D, Al ∈ m×k is a matrix whose elements are defined by 1 if dl is assigned to cluster i by Cj ; 0 otherwise Al (i, j) = (1) It is easy to see that the summation of every row of Al equals 1. [sent-83, score-0.478]

31 Our goal is to cluster the elements D = {d1 , d2 , . [sent-85, score-0.293]

32 We now consider how to compute the clusters based on the similarity (or dissimilarity) of strings. [sent-89, score-0.16]

33 Though their results shed considerable light on the hardness of string clustering with the selected distance measures, those techniques cannot be directly applied to the problem at hand because the objective here is fairly different from the one in [11]. [sent-92, score-0.399]

34 Fortunately, our analysis reveals that a simpler objective is sufficient to capture the essence of similarity maximization in clusters using certain special properties of the A-strings. [sent-93, score-0.203]

35 Our approach is partly inspired by the classical k-means clustering where all data points are assigned to the cluster based on the shortest distance to the cluster center. [sent-94, score-0.853]

36 Imagine an ideal input instance for the ensemble clustering problem (all clustering algorithms behave similarly) – one with only k unique members among n A-strings. [sent-95, score-0.958]

37 The representative for each cluster will then be exactly like its members, is a valid Astring, and can be viewed as a center in a geometric sense. [sent-97, score-0.294]

38 Naturally, the centers of the clusters will vary from its members. [sent-99, score-0.142]

39 This variation can be thought of as noise or disagreement within the clusters, our objective is to find a set of clusters (and centers) such that the noise is minimized and we move very close to the ideal case. [sent-100, score-0.143]

40 Consider an example where a cluster i in this optimal solution contains items (d1 , d2 , . [sent-102, score-0.328]

41 A certain algorithm Cj in the input ensemble clusters items (d1 , d2 , d3 , d4 ) in cluster s and (d5 , d6 , d7 ) in cluster p. [sent-106, score-0.935]

42 How would Cj behave if evaluating the center of cluster i as a data item? [sent-107, score-0.294]

43 The probability it assigns the center to cluster s is 4/7 and the probability it assigns the center to cluster p is 3/7. [sent-108, score-0.588]

44 It can be verified that this choice minimizes the dissent of all items in cluster i to the center. [sent-110, score-0.371]

45 The Astring for the center of cluster i will have a “1” at position (j, s). [sent-111, score-0.294]

46 Hence, the dissent within a cluster can be quantified simply by averaging the matrices of elements that belong to the cluster and computing the difference to the center. [sent-113, score-0.62]

47 Our goal is to find such an assignment and group the A-strings so that the sum of the absolute differences of the averages of clusters to their centers (dissent) is minimized. [sent-114, score-0.188]

48 In the subsequent sections, we will introduce our optimization framework for ensemble clustering based on these ideas. [sent-115, score-0.594]

49 4 Integer Program for Model 1 We start with a discussion of an Integer Program (IP, for short) formulation for ensemble clustering. [sent-116, score-0.32]

50 ∗ ∗ For convenience, we denote the final clustering solution by C ∗ = {C1 , . [sent-117, score-0.338]

51 , Ck } and Cij denotes the cluster i by the algorithm j. [sent-120, score-0.251]

52 Xli = siji = ∗ 1 if dl ∈ Ci ; 0 otherwise (2) ∗ ∗ 1 if Ci = arg max {|Ci i=1,. [sent-122, score-0.3]

53 ,k 0 otherwise Cij |} (3) We mention that the above definition implies that for a fixed index i , its center, siji also provides an ∗ indicator to the cluster most similar to Ci in the set of clusters produced by the clustering algorithm Cj . [sent-125, score-0.903]

54 k k m siji − min i =1 i=1 j=1 n l=1 Alij Xli n l=1 Xli k n Xli = 1 ∀l ∈ [1, n], s. [sent-127, score-0.247]

55 (4) Xli ≥ 1 ∀i ∈ [1, k], i =1 (5) l=1 k siji = 1 ∀j ∈ [1, m], i ∈ [1, k], Xli ∈ {0, 1}, siji ∈ {0, 1}. [sent-129, score-0.494]

56 (6) i=1 ∗ (4) minimizes the sum of the difference between siji (the center for cluster Ci ) and the average of ∗ all Alij bits of the data elements dl assigned to cluster Ci . [sent-130, score-0.933]

57 Recall that siji will be 1 if Cij is the ∗ mostP similar cluster to Ci among all the P clusters produced by algorithm Cj . [sent-131, score-0.628]

58 Hence, if siji = 0 n A n X A X l=1 l=1 and Pn lij li = 0, the value siji − Pn lij li represents the percentage of data elements l=1 Xli l=1 Xli ∗ in Ci that do not consent with the majority of the other elements in the group w. [sent-132, score-0.722]

59 In other words, we are trying to minimize the dissent and maximize the consent simultaneously. [sent-136, score-0.114]

60 The remaining constraints are relatively simple – (5) enforces the condition that a data element should belong to precisely one cluster in the final solution and that every cluster must have size at least 1; (6) ensures that siji is an appropriate A-string for every cluster center. [sent-137, score-1.066]

61 5 0-1 Semidefinite Program for Model 1 The formulation given by (4)-(6) is a mixed integer program (MIP, for short) with a nonlinear objective function in (4). [sent-138, score-0.112]

62 By introducing artificial variables, we rewrite (4) as k m k tiji min (7) i=1 j=1 i =1 where the term ciji siji − ciji ≤ tiji , ciji − siji ≤ tiji represents the second term in (4) defined by n l=1 Alij Xli n l=1 Xli ciji = ∀i, i , j, ∀i, i , j. [sent-142, score-1.809]

63 (8) (9) Since both Alij and Xli are binary, (9) can be rewritten as n 2 2 l=1 Alij Xli n 2 l=1 Xli ciji = Let us introduce a matrix variable yi ∈ (l) n (10) n yi = Let Aij ∈ ∀i, i , j. [sent-143, score-0.286]

64 This allows us to represent (10) as 2 ciji = tr(Bij Zi ), Zi = Zi , Zi 0, (12) where Bij = diag(Aij ) is a diagonal matrix with (Bij )ll = Al (i, j), the second and third properties T follow from Zi = yi yi being a positive semidefinite matrix. [sent-146, score-0.286]

65 (17) i =1 Si ek = em ∀i ∈ [1, k], 2 Zi = Zi ; Z ≥ 0; n where Qi (i, j) = ciji = tr(Bij Zi ), and en ∈ T Zi = Zi ; is a vector of all 1s. [sent-154, score-0.327]

66 6 Integer Program and 0-1 Semidefinite Program for Model 2 Recall the definition of the variables ciji , which can be interpreted as the size of the overlap between ∗ ∗ the cluster Ci in the final solution and Cij , and is proportional to the cardinality of Ci . [sent-159, score-0.57]

67 ,k Let us also define vector variables qji whose ith element is siji − ciji . [sent-164, score-0.599]

68 The main difficulty in the previous formulation stems from the fact that ciji is a fractional function w. [sent-166, score-0.317]

69 Fortunately, we k note that since entries of ciji are fractional satisfying i=1 ciji = 1 for any fixed j, i , their sum of squares is maximized when its largest entry is as high as possible. [sent-169, score-0.572]

70 Therefore, the second term of (21) can be written as k m k T T Xi Aij AT Xi (Xi Xi )−1 ) ij = tr( i =1 j=1 i=1 k m k m k Aij AT Zi ) = tr( ij = tr( i =1 j=1 i=1 m Aij AT Z) = tr( ij j=1 i=1 Bj Z) = tr(BZ). [sent-184, score-0.09]

71 7 Experimental Results Our experiments included evaluations on several classification datasets, segmentation databases and simulations. [sent-196, score-0.154]

72 To create the ensemble, we used a set of [4, 10] clustering schemes (by varying the clustering criterion and/or algorithm) from the CLUTO clustering toolkit. [sent-203, score-0.915]

73 The multiple solutions comprised the input ensemble, our model was then used to determine a agreement maximizing solution. [sent-204, score-0.145]

74 The ground-truth data was used at this stage to evaluate accuracy of the ensemble (and individual schemes). [sent-205, score-0.321]

75 For each case, we can see that the ensemble clustering solution is at least as good as the best clustering algorithm. [sent-207, score-0.932]

76 Observe, however, that while such results are expected for this and many other datasets (see [3]), the consensus solution may not always be superior to the ‘best’ clustering solution. [sent-208, score-0.399]

77 An ensemble solution is useful because we do not know a priori that which algorithm will perform the best (especially if ground truth is unavailable). [sent-211, score-0.322]

78 1 3 4 5 6 7 8 9 10 Number of clustering algorithms in ensemble (m) 11 Mislabelled cases in each algorithm Mislabelled cases in each algorithm Mislabelled cases in each algorithm 0. [sent-216, score-0.594]

79 1 3 4 5 6 7 8 9 10 Number of clustering algorithms in ensemble (m) 11 0. [sent-221, score-0.594]

80 1 3 4 5 6 7 8 9 10 Number of clustering algorithms in ensemble (m) (a) (b) (c) Figure 1: Synergy. [sent-227, score-0.594]

81 The fraction of mislabeled cases ([0, 1]) in a consensus solution (∗) is compared to the number of mislabelled cases (∆) in individual clustering algorithms. [sent-228, score-0.507]

82 We illustrate the ensemble effect for the Iris dataset in (a), the Soybean dataset in (b), and the Wine dataset in (c). [sent-229, score-0.289]

83 Even sophisticated segmentation algorithms may yield ‘different’ results on the same image, when multiple segmentations are available, it seems reasonable to ‘combine’ segmentations to reduce degeneracies. [sent-231, score-0.219]

84 Our experimental results indicate that in many cases, we can obtain a better overall segmentation that captures (more) details in the images more accurately with fewer outlying clusters. [sent-232, score-0.107]

85 For instance, (a) and (d) do not segment the eyes; (b) does well in segmenting the shirt collar region but can only recognize one of the eyes and (c) creates an additional cut across the forehead. [sent-237, score-0.111]

86 The ensemble (extreme right) is able to segment these details (eyes, shirt collar and cap) nicely by combining (a)–(d). [sent-238, score-0.36]

87 (a) (b) (c) (d) ensemble Figure 2: A segmentation ensemble on an image from the Berkeley Segmentation dataset. [sent-240, score-0.715]

88 (a)–(d) show the individual segmentations overlaid on the input image, the right-most image shows the segmentation generated from ensemble clustering. [sent-241, score-0.514]

89 The final set of our experiments were performed on 500 runs of artificially generated cluster ensembles. [sent-242, score-0.251]

90 We first constructed an initial set segmentation, this was then repeatedly permuted (up to 15%) yielding a set of clustering solutions. [sent-243, score-0.305]

91 Finally, Figure 1(b) shows a comparison of relaxed SDP Model 2 with [3] on the objective function optimized in [3] (using best among two techniques). [sent-255, score-0.106]

92 The results show that even empirically our objective functions model similarity rather well, and that Normalized Mutual Information may be implicitly optimized within this framework. [sent-257, score-0.103]

93 8 Conclusions We have proposed a new algorithm for ensemble clustering based on a SDP formulation. [sent-263, score-0.594]

94 Among the important contributions of this paper is, we believe, the observation that the notion of agreement in an ensemble can be captured better using string encoding rather than a voting strategy. [sent-264, score-0.493]

95 We discuss extensive experimental evaluations on simulations and real datasets, in addition, we illustrate application of the algorithm for segmentation ensembles. [sent-267, score-0.154]

96 We feel that the latter application is of independent research interest; to the best of our knowledge, this is the first algorithmic treatment of generating segmentation ensembles for improving accuracy. [sent-268, score-0.157]

97 200 500 Worse Better Better Worse Number of instances 300 100 100 200 50 0 100 0 1 2 3 4 5 Ratios of solutions of objective functions of the two algorithms Better 150 Number of instances 150 Number of instances 200 Worse 400 0 0. [sent-269, score-0.097]

98 8 Ratios of solutions of objective functions of the two algorithms 50 0 -0. [sent-276, score-0.097]

99 In (c), comparisons (difference in normalized values) between our solution and the best among IBGF and CBGF based on the Normalized Mutual Information (NMI) objective function used in [3]. [sent-288, score-0.106]

100 Random projection for high dimensional data clustering: A cluster ensemble approach. [sent-361, score-0.574]


similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('xli', 0.495), ('clustering', 0.305), ('ensemble', 0.289), ('ciji', 0.286), ('cluster', 0.251), ('siji', 0.247), ('sdp', 0.179), ('zi', 0.163), ('alij', 0.152), ('segmentation', 0.107), ('ci', 0.101), ('clusters', 0.1), ('agreement', 0.091), ('aij', 0.085), ('tr', 0.078), ('dissent', 0.076), ('ibgf', 0.076), ('mislabelled', 0.076), ('cj', 0.076), ('consensus', 0.061), ('similarity', 0.06), ('cij', 0.058), ('buffalo', 0.057), ('cbgf', 0.057), ('charikar', 0.057), ('dick', 0.057), ('tiji', 0.057), ('bij', 0.056), ('segmentations', 0.056), ('semide', 0.055), ('solutions', 0.054), ('dl', 0.053), ('string', 0.051), ('al', 0.051), ('ensembles', 0.05), ('ip', 0.048), ('evaluations', 0.047), ('group', 0.046), ('assigned', 0.046), ('sedumi', 0.045), ('hamming', 0.045), ('items', 0.044), ('objective', 0.043), ('center', 0.043), ('harry', 0.042), ('centers', 0.042), ('graph', 0.042), ('elements', 0.042), ('en', 0.041), ('eyes', 0.04), ('program', 0.038), ('ailon', 0.038), ('assisted', 0.038), ('astring', 0.038), ('bansal', 0.038), ('bz', 0.038), ('collar', 0.038), ('consent', 0.038), ('consolidated', 0.038), ('dinner', 0.038), ('gasieniec', 0.038), ('iris', 0.038), ('jaccard', 0.038), ('jinhui', 0.038), ('wine', 0.038), ('xi', 0.036), ('microarray', 0.035), ('projection', 0.034), ('qji', 0.033), ('fern', 0.033), ('peng', 0.033), ('shirt', 0.033), ('yalmip', 0.033), ('solution', 0.033), ('relaxed', 0.033), ('element', 0.033), ('voting', 0.033), ('si', 0.032), ('strings', 0.032), ('individual', 0.032), ('formulation', 0.031), ('tom', 0.031), ('lij', 0.03), ('soybean', 0.03), ('lth', 0.03), ('convexi', 0.03), ('shared', 0.03), ('image', 0.03), ('item', 0.03), ('dj', 0.03), ('ll', 0.03), ('ij', 0.03), ('among', 0.03), ('encoding', 0.029), ('members', 0.029), ('qi', 0.029), ('cm', 0.029), ('assign', 0.029), ('dn', 0.028), ('biostatistics', 0.028)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 1.0000004 80 nips-2007-Ensemble Clustering using Semidefinite Programming

Author: Vikas Singh, Lopamudra Mukherjee, Jiming Peng, Jinhui Xu

Abstract: We consider the ensemble clustering problem where the task is to ‘aggregate’ multiple clustering solutions into a single consolidated clustering that maximizes the shared information among given clustering solutions. We obtain several new results for this problem. First, we note that the notion of agreement under such circumstances can be better captured using an agreement measure based on a 2D string encoding rather than voting strategy based methods proposed in literature. Using this generalization, we first derive a nonlinear optimization model to maximize the new agreement measure. We then show that our optimization problem can be transformed into a strict 0-1 Semidefinite Program (SDP) via novel convexification techniques which can subsequently be relaxed to a polynomial time solvable SDP. Our experiments indicate improvements not only in terms of the proposed agreement measure but also the existing agreement measures based on voting strategies. We discuss evaluations on clustering and image segmentation databases. 1

2 0.20472735 65 nips-2007-DIFFRAC: a discriminative and flexible framework for clustering

Author: Francis R. Bach, Zaïd Harchaoui

Abstract: We present a novel linear clustering framework (D IFFRAC) which relies on a linear discriminative cost function and a convex relaxation of a combinatorial optimization problem. The large convex optimization problem is solved through a sequence of lower dimensional singular value decompositions. This framework has several attractive properties: (1) although apparently similar to K-means, it exhibits superior clustering performance than K-means, in particular in terms of robustness to noise. (2) It can be readily extended to non linear clustering if the discriminative cost function is based on positive definite kernels, and can then be seen as an alternative to spectral clustering. (3) Prior information on the partition is easily incorporated, leading to state-of-the-art performance for semi-supervised learning, for clustering or classification. We present empirical evaluations of our algorithms on synthetic and real medium-scale datasets.

3 0.18548432 75 nips-2007-Efficient Bayesian Inference for Dynamically Changing Graphs

Author: Ozgur Sumer, Umut Acar, Alexander T. Ihler, Ramgopal R. Mettu

Abstract: Motivated by stochastic systems in which observed evidence and conditional dependencies between states of the network change over time, and certain quantities of interest (marginal distributions, likelihood estimates etc.) must be updated, we study the problem of adaptive inference in tree-structured Bayesian networks. We describe an algorithm for adaptive inference that handles a broad range of changes to the network and is able to maintain marginal distributions, MAP estimates, and data likelihoods in all expected logarithmic time. We give an implementation of our algorithm and provide experiments that show that the algorithm can yield up to two orders of magnitude speedups on answering queries and responding to dynamic changes over the sum-product algorithm. 1

4 0.17347904 61 nips-2007-Convex Clustering with Exemplar-Based Models

Author: Danial Lashkari, Polina Golland

Abstract: Clustering is often formulated as the maximum likelihood estimation of a mixture model that explains the data. The EM algorithm widely used to solve the resulting optimization problem is inherently a gradient-descent method and is sensitive to initialization. The resulting solution is a local optimum in the neighborhood of the initial guess. This sensitivity to initialization presents a significant challenge in clustering large data sets into many clusters. In this paper, we present a different approach to approximate mixture fitting for clustering. We introduce an exemplar-based likelihood function that approximates the exact likelihood. This formulation leads to a convex minimization problem and an efficient algorithm with guaranteed convergence to the globally optimal solution. The resulting clustering can be thought of as a probabilistic mapping of the data points to the set of exemplars that minimizes the average distance and the information-theoretic cost of mapping. We present experimental results illustrating the performance of our algorithm and its comparison with the conventional approach to mixture model clustering. 1

5 0.15549903 46 nips-2007-Cluster Stability for Finite Samples

Author: Ohad Shamir, Naftali Tishby

Abstract: Over the past few years, the notion of stability in data clustering has received growing attention as a cluster validation criterion in a sample-based framework. However, recent work has shown that as the sample size increases, any clustering model will usually become asymptotically stable. This led to the conclusion that stability is lacking as a theoretical and practical tool. The discrepancy between this conclusion and the success of stability in practice has remained an open question, which we attempt to address. Our theoretical approach is that stability, as used by cluster validation algorithms, is similar in certain respects to measures of generalization in a model-selection framework. In such cases, the model chosen governs the convergence rate of generalization bounds. By arguing that these rates are more important than the sample size, we are led to the prediction that stability-based cluster validation algorithms should not degrade with increasing sample size, despite the asymptotic universal stability. This prediction is substantiated by a theoretical analysis as well as some empirical results. We conclude that stability remains a meaningful cluster validation criterion over finite samples. 1

6 0.10200348 58 nips-2007-Consistent Minimization of Clustering Objective Functions

7 0.089678876 70 nips-2007-Discriminative K-means for Clustering

8 0.081127442 143 nips-2007-Object Recognition by Scene Alignment

9 0.071615428 183 nips-2007-Spatial Latent Dirichlet Allocation

10 0.069368377 84 nips-2007-Expectation Maximization and Posterior Constraints

11 0.068090484 76 nips-2007-Efficient Convex Relaxation for Transductive Support Vector Machine

12 0.066638559 166 nips-2007-Regularized Boost for Semi-Supervised Learning

13 0.061354522 32 nips-2007-Bayesian Co-Training

14 0.060707189 203 nips-2007-The rat as particle filter

15 0.058347773 92 nips-2007-Fixing Max-Product: Convergent Message Passing Algorithms for MAP LP-Relaxations

16 0.054890338 105 nips-2007-Infinite State Bayes-Nets for Structured Domains

17 0.05323948 31 nips-2007-Bayesian Agglomerative Clustering with Coalescents

18 0.051281102 190 nips-2007-Support Vector Machine Classification with Indefinite Kernels

19 0.050317563 68 nips-2007-Discovering Weakly-Interacting Factors in a Complex Stochastic Process

20 0.049280725 63 nips-2007-Convex Relaxations of Latent Variable Training


similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, -0.189), (1, 0.057), (2, -0.12), (3, 0.021), (4, -0.019), (5, -0.128), (6, 0.053), (7, 0.089), (8, -0.042), (9, 0.028), (10, -0.07), (11, 0.009), (12, 0.09), (13, -0.235), (14, -0.119), (15, 0.295), (16, 0.019), (17, -0.091), (18, 0.052), (19, 0.11), (20, -0.032), (21, 0.023), (22, -0.138), (23, 0.0), (24, 0.004), (25, 0.044), (26, 0.076), (27, -0.07), (28, 0.016), (29, 0.001), (30, 0.148), (31, 0.019), (32, -0.016), (33, -0.005), (34, 0.069), (35, -0.023), (36, 0.095), (37, -0.024), (38, 0.038), (39, 0.037), (40, 0.059), (41, 0.037), (42, -0.094), (43, -0.004), (44, 0.049), (45, 0.004), (46, -0.109), (47, -0.026), (48, -0.057), (49, 0.055)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.96959561 80 nips-2007-Ensemble Clustering using Semidefinite Programming

Author: Vikas Singh, Lopamudra Mukherjee, Jiming Peng, Jinhui Xu

Abstract: We consider the ensemble clustering problem where the task is to ‘aggregate’ multiple clustering solutions into a single consolidated clustering that maximizes the shared information among given clustering solutions. We obtain several new results for this problem. First, we note that the notion of agreement under such circumstances can be better captured using an agreement measure based on a 2D string encoding rather than voting strategy based methods proposed in literature. Using this generalization, we first derive a nonlinear optimization model to maximize the new agreement measure. We then show that our optimization problem can be transformed into a strict 0-1 Semidefinite Program (SDP) via novel convexification techniques which can subsequently be relaxed to a polynomial time solvable SDP. Our experiments indicate improvements not only in terms of the proposed agreement measure but also the existing agreement measures based on voting strategies. We discuss evaluations on clustering and image segmentation databases. 1

2 0.77451563 61 nips-2007-Convex Clustering with Exemplar-Based Models

Author: Danial Lashkari, Polina Golland

Abstract: Clustering is often formulated as the maximum likelihood estimation of a mixture model that explains the data. The EM algorithm widely used to solve the resulting optimization problem is inherently a gradient-descent method and is sensitive to initialization. The resulting solution is a local optimum in the neighborhood of the initial guess. This sensitivity to initialization presents a significant challenge in clustering large data sets into many clusters. In this paper, we present a different approach to approximate mixture fitting for clustering. We introduce an exemplar-based likelihood function that approximates the exact likelihood. This formulation leads to a convex minimization problem and an efficient algorithm with guaranteed convergence to the globally optimal solution. The resulting clustering can be thought of as a probabilistic mapping of the data points to the set of exemplars that minimizes the average distance and the information-theoretic cost of mapping. We present experimental results illustrating the performance of our algorithm and its comparison with the conventional approach to mixture model clustering. 1

3 0.74416476 65 nips-2007-DIFFRAC: a discriminative and flexible framework for clustering

Author: Francis R. Bach, Zaïd Harchaoui

Abstract: We present a novel linear clustering framework (D IFFRAC) which relies on a linear discriminative cost function and a convex relaxation of a combinatorial optimization problem. The large convex optimization problem is solved through a sequence of lower dimensional singular value decompositions. This framework has several attractive properties: (1) although apparently similar to K-means, it exhibits superior clustering performance than K-means, in particular in terms of robustness to noise. (2) It can be readily extended to non linear clustering if the discriminative cost function is based on positive definite kernels, and can then be seen as an alternative to spectral clustering. (3) Prior information on the partition is easily incorporated, leading to state-of-the-art performance for semi-supervised learning, for clustering or classification. We present empirical evaluations of our algorithms on synthetic and real medium-scale datasets.

4 0.73578733 46 nips-2007-Cluster Stability for Finite Samples

Author: Ohad Shamir, Naftali Tishby

Abstract: Over the past few years, the notion of stability in data clustering has received growing attention as a cluster validation criterion in a sample-based framework. However, recent work has shown that as the sample size increases, any clustering model will usually become asymptotically stable. This led to the conclusion that stability is lacking as a theoretical and practical tool. The discrepancy between this conclusion and the success of stability in practice has remained an open question, which we attempt to address. Our theoretical approach is that stability, as used by cluster validation algorithms, is similar in certain respects to measures of generalization in a model-selection framework. In such cases, the model chosen governs the convergence rate of generalization bounds. By arguing that these rates are more important than the sample size, we are led to the prediction that stability-based cluster validation algorithms should not degrade with increasing sample size, despite the asymptotic universal stability. This prediction is substantiated by a theoretical analysis as well as some empirical results. We conclude that stability remains a meaningful cluster validation criterion over finite samples. 1

5 0.68578398 70 nips-2007-Discriminative K-means for Clustering

Author: Jieping Ye, Zheng Zhao, Mingrui Wu

Abstract: We present a theoretical study on the discriminative clustering framework, recently proposed for simultaneous subspace selection via linear discriminant analysis (LDA) and clustering. Empirical results have shown its favorable performance in comparison with several other popular clustering algorithms. However, the inherent relationship between subspace selection and clustering in this framework is not well understood, due to the iterative nature of the algorithm. We show in this paper that this iterative subspace selection and clustering is equivalent to kernel K-means with a specific kernel Gram matrix. This provides significant and new insights into the nature of this subspace selection procedure. Based on this equivalence relationship, we propose the Discriminative K-means (DisKmeans) algorithm for simultaneous LDA subspace selection and clustering, as well as an automatic parameter estimation procedure. We also present the nonlinear extension of DisKmeans using kernels. We show that the learning of the kernel matrix over a convex set of pre-specified kernel matrices can be incorporated into the clustering formulation. The connection between DisKmeans and several other clustering algorithms is also analyzed. The presented theories and algorithms are evaluated through experiments on a collection of benchmark data sets. 1

6 0.53769404 75 nips-2007-Efficient Bayesian Inference for Dynamically Changing Graphs

7 0.50933951 58 nips-2007-Consistent Minimization of Clustering Objective Functions

8 0.36934868 31 nips-2007-Bayesian Agglomerative Clustering with Coalescents

9 0.34780893 49 nips-2007-Colored Maximum Variance Unfolding

10 0.33986685 76 nips-2007-Efficient Convex Relaxation for Transductive Support Vector Machine

11 0.33116272 4 nips-2007-A Constraint Generation Approach to Learning Stable Linear Dynamical Systems

12 0.31146288 143 nips-2007-Object Recognition by Scene Alignment

13 0.29843217 84 nips-2007-Expectation Maximization and Posterior Constraints

14 0.29470477 109 nips-2007-Kernels on Attributed Pointsets with Applications

15 0.290555 63 nips-2007-Convex Relaxations of Latent Variable Training

16 0.28389785 68 nips-2007-Discovering Weakly-Interacting Factors in a Complex Stochastic Process

17 0.28171527 12 nips-2007-A Spectral Regularization Framework for Multi-Task Structure Learning

18 0.27392676 144 nips-2007-On Ranking in Survival Analysis: Bounds on the Concordance Index

19 0.27211344 190 nips-2007-Support Vector Machine Classification with Indefinite Kernels

20 0.27022347 27 nips-2007-Anytime Induction of Cost-sensitive Trees


similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(5, 0.037), (13, 0.028), (16, 0.011), (19, 0.015), (21, 0.056), (31, 0.377), (34, 0.027), (35, 0.033), (47, 0.073), (49, 0.017), (83, 0.145), (85, 0.032), (87, 0.022), (90, 0.054)]

similar papers list:

simIndex simValue paperId paperTitle

1 0.90949333 162 nips-2007-Random Sampling of States in Dynamic Programming

Author: Chris Atkeson, Benjamin Stephens

Abstract: We combine three threads of research on approximate dynamic programming: sparse random sampling of states, value function and policy approximation using local models, and using local trajectory optimizers to globally optimize a policy and associated value function. Our focus is on finding steady state policies for deterministic time invariant discrete time control problems with continuous states and actions often found in robotics. In this paper we show that we can now solve problems we couldn’t solve previously. 1

2 0.86065012 144 nips-2007-On Ranking in Survival Analysis: Bounds on the Concordance Index

Author: Harald Steck, Balaji Krishnapuram, Cary Dehing-oberije, Philippe Lambin, Vikas C. Raykar

Abstract: In this paper, we show that classical survival analysis involving censored data can naturally be cast as a ranking problem. The concordance index (CI), which quantifies the quality of rankings, is the standard performance measure for model assessment in survival analysis. In contrast, the standard approach to learning the popular proportional hazard (PH) model is based on Cox’s partial likelihood. We devise two bounds on CI–one of which emerges directly from the properties of PH models–and optimize them directly. Our experimental results suggest that all three methods perform about equally well, with our new approach giving slightly better results. We also explain why a method designed to maximize the Cox’s partial likelihood also ends up (approximately) maximizing the CI. 1

3 0.85310733 214 nips-2007-Variational inference for Markov jump processes

Author: Manfred Opper, Guido Sanguinetti

Abstract: Markov jump processes play an important role in a large number of application domains. However, realistic systems are analytically intractable and they have traditionally been analysed using simulation based techniques, which do not provide a framework for statistical inference. We propose a mean field approximation to perform posterior inference and parameter estimation. The approximation allows a practical solution to the inference problem, while still retaining a good degree of accuracy. We illustrate our approach on two biologically motivated systems.

same-paper 4 0.80471832 80 nips-2007-Ensemble Clustering using Semidefinite Programming

Author: Vikas Singh, Lopamudra Mukherjee, Jiming Peng, Jinhui Xu

Abstract: We consider the ensemble clustering problem where the task is to ‘aggregate’ multiple clustering solutions into a single consolidated clustering that maximizes the shared information among given clustering solutions. We obtain several new results for this problem. First, we note that the notion of agreement under such circumstances can be better captured using an agreement measure based on a 2D string encoding rather than voting strategy based methods proposed in literature. Using this generalization, we first derive a nonlinear optimization model to maximize the new agreement measure. We then show that our optimization problem can be transformed into a strict 0-1 Semidefinite Program (SDP) via novel convexification techniques which can subsequently be relaxed to a polynomial time solvable SDP. Our experiments indicate improvements not only in terms of the proposed agreement measure but also the existing agreement measures based on voting strategies. We discuss evaluations on clustering and image segmentation databases. 1

5 0.57587695 163 nips-2007-Receding Horizon Differential Dynamic Programming

Author: Yuval Tassa, Tom Erez, William D. Smart

Abstract: The control of high-dimensional, continuous, non-linear dynamical systems is a key problem in reinforcement learning and control. Local, trajectory-based methods, using techniques such as Differential Dynamic Programming (DDP), are not directly subject to the curse of dimensionality, but generate only local controllers. In this paper,we introduce Receding Horizon DDP (RH-DDP), an extension to the classic DDP algorithm, which allows us to construct stable and robust controllers based on a library of local-control trajectories. We demonstrate the effectiveness of our approach on a series of high-dimensional problems using a simulated multi-link swimming robot. These experiments show that our approach effectively circumvents dimensionality issues, and is capable of dealing with problems of (at least) 24 state and 9 action dimensions. 1

6 0.54910463 168 nips-2007-Reinforcement Learning in Continuous Action Spaces through Sequential Monte Carlo Methods

7 0.54083598 174 nips-2007-Selecting Observations against Adversarial Objectives

8 0.52435577 76 nips-2007-Efficient Convex Relaxation for Transductive Support Vector Machine

9 0.52048391 105 nips-2007-Infinite State Bayes-Nets for Structured Domains

10 0.5184474 141 nips-2007-New Outer Bounds on the Marginal Polytope

11 0.51788479 16 nips-2007-A learning framework for nearest neighbor search

12 0.51764166 58 nips-2007-Consistent Minimization of Clustering Objective Functions

13 0.51339531 34 nips-2007-Bayesian Policy Learning with Trans-Dimensional MCMC

14 0.51287329 94 nips-2007-Gaussian Process Models for Link Analysis and Transfer Learning

15 0.51119965 151 nips-2007-Optimistic Linear Programming gives Logarithmic Regret for Irreducible MDPs

16 0.51048565 185 nips-2007-Stable Dual Dynamic Programming

17 0.50825065 52 nips-2007-Competition Adds Complexity

18 0.50726748 63 nips-2007-Convex Relaxations of Latent Variable Training

19 0.50695026 204 nips-2007-Theoretical Analysis of Heuristic Search Methods for Online POMDPs

20 0.50469214 209 nips-2007-Ultrafast Monte Carlo for Statistical Summations