nips nips2007 nips2007-46 knowledge-graph by maker-knowledge-mining

46 nips-2007-Cluster Stability for Finite Samples


Source: pdf

Author: Ohad Shamir, Naftali Tishby

Abstract: Over the past few years, the notion of stability in data clustering has received growing attention as a cluster validation criterion in a sample-based framework. However, recent work has shown that as the sample size increases, any clustering model will usually become asymptotically stable. This led to the conclusion that stability is lacking as a theoretical and practical tool. The discrepancy between this conclusion and the success of stability in practice has remained an open question, which we attempt to address. Our theoretical approach is that stability, as used by cluster validation algorithms, is similar in certain respects to measures of generalization in a model-selection framework. In such cases, the model chosen governs the convergence rate of generalization bounds. By arguing that these rates are more important than the sample size, we are led to the prediction that stability-based cluster validation algorithms should not degrade with increasing sample size, despite the asymptotic universal stability. This prediction is substantiated by a theoretical analysis as well as some empirical results. We conclude that stability remains a meaningful cluster validation criterion over finite samples. 1

Reference: text


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 il Abstract Over the past few years, the notion of stability in data clustering has received growing attention as a cluster validation criterion in a sample-based framework. [sent-4, score-1.108]

2 However, recent work has shown that as the sample size increases, any clustering model will usually become asymptotically stable. [sent-5, score-0.505]

3 This led to the conclusion that stability is lacking as a theoretical and practical tool. [sent-6, score-0.544]

4 The discrepancy between this conclusion and the success of stability in practice has remained an open question, which we attempt to address. [sent-7, score-0.544]

5 Our theoretical approach is that stability, as used by cluster validation algorithms, is similar in certain respects to measures of generalization in a model-selection framework. [sent-8, score-0.31]

6 By arguing that these rates are more important than the sample size, we are led to the prediction that stability-based cluster validation algorithms should not degrade with increasing sample size, despite the asymptotic universal stability. [sent-10, score-0.561]

7 We conclude that stability remains a meaningful cluster validation criterion over finite samples. [sent-12, score-0.796]

8 In this paper, we focus on sample based clustering, where it is assumed that the data to be clustered are actually a sample from some underlying distribution. [sent-15, score-0.223]

9 A major problem in such a setting is assessing cluster validity. [sent-16, score-0.167]

10 In other words, we might wish to know whether the clustering we have found actually corresponds to a meaningful clustering of the underlying distribution, and is not just an artifact of the sampling process. [sent-17, score-0.695]

11 This problem relates to the issue of model selection, such as determining the number of clusters in the data or tuning parameters of the clustering algorithm. [sent-18, score-0.388]

12 In the past few years, cluster stability has received growing attention as a criterion for addressing this problem. [sent-19, score-0.735]

13 Informally, this criterion states that if the clustering algorithm is repeatedly applied over independent samples, resulting in ’similar’ clusterings, then these clusterings are statistically significant. [sent-20, score-0.411]

14 Based on this idea, several cluster validity methods have been proposed (see [9] and references therein), and were shown to be relatively successful for various data sets in practice. [sent-21, score-0.17]

15 However, in recent work, it was proven that under mild conditions, stability is asymptotically fully determined by the behavior of the objective function which the clustering algorithm attempts to optimize. [sent-22, score-0.948]

16 In particular, the existence of a unique optimal solution for some model choice implies stability as sample size increase to infinity. [sent-23, score-0.696]

17 From this, it was concluded that stability is not a well-suited tool for model selection in clustering. [sent-25, score-0.63]

18 This left open, however, the question of why stability is observed to be useful in practice. [sent-26, score-0.544]

19 1 In this paper, we attempt to explain why stability measures should have much wider relevance than what might be concluded from these results. [sent-27, score-0.699]

20 Our underlying approach is to view stability as a measure of generalization, in a learning-theoretic sense. [sent-28, score-0.576]

21 In other words, stability should ’work’ because stable models generalize better, and models which generalize better should fit the underlying distribution better. [sent-30, score-0.612]

22 The novelty in this paper lies mainly in the predictions that are drawn from it for clustering stability. [sent-32, score-0.319]

23 The viewpoint above places emphasis on the nature of stability for finite samples. [sent-33, score-0.544]

24 Since generalization is meaningless when the sample is infinite, it should come as no surprise that stability displays similar behavior. [sent-34, score-0.719]

25 On finite samples, the generalization uncertainty is virtually always strictly positive, with different model choices leading to different convergence rates towards zero for increasing sample size. [sent-35, score-0.293]

26 Based on the link between stability and generalization, we predict that on realistic data, all risk-minimizing models asymptotically become stable, but the rates of convergence to this ultimate stability differ. [sent-36, score-1.22]

27 In other words, an appropriate scaling of the stability measures will make them independent of the actual sample size used. [sent-37, score-0.757]

28 Using this intuition, we characterize and prove a mild set of conditions, applicable in principle to a wide class of clustering settings, which ensure the relevance of cluster stability for arbitrarily large sample sizes. [sent-38, score-1.228]

29 We then prove that the stability measure used in previous work to show negative asymptotic results on stability, actually allows us to discern the ’correct’ model, regardless of how large is the sample, for a certain simple setting. [sent-39, score-0.771]

30 A clustering CD for some D ⊆ X is a function from D × D to {0, 1}, defining an equivalence relation on D with a finite number of equivalence classes (namely, CD (xi , xj ) = 1 if xi and xj belong to the same cluster, and 0 otherwise). [sent-46, score-0.319]

31 For a clustering CX of the instance space, and a finite sample S, let CX |S denote the functional restriction of CX on S × S. [sent-47, score-0.442]

32 A clustering algorithm A is a function from any finite sample S ⊆ X , to some clustering CX of the instance space1 . [sent-48, score-0.761]

33 Following [2], we define the stability of a clustering algorithm A on finite samples of size m as: stab(A, D, m) = ES1 ,S2 dD (A(S1 ), A(S2 )), (1) where S1 and S2 are samples of size m, drawn i. [sent-51, score-1.041]

34 Let denote a loss function from any clustering CS of a finite set S ⊆ X to [0, 1]. [sent-54, score-0.319]

35 may or may not correspond to the objective function the clustering algorithm attempts to optimize, and may involve a global quality measure rather than some average over individual instances. [sent-55, score-0.351]

36 For a fixed sample size, we say that obeys the bounded differences property (see [11]), if for any clustering CS it holds that | (CS ) − (CS )| ≤ a, where a is a constant, and CS is obtained from CS by replacing at most one instance of S by any other instance from X , and clustering it arbitrarily. [sent-56, score-0.876]

37 The empirical risk of a clustering CX ∈ H on a sample S of size m is (CX |S ). [sent-58, score-0.539]

38 The expected risk of CX , with respect to samples S of size m, will be defined as ES (CX |S ). [sent-59, score-0.145]

39 1 Many clustering algorithms, such as spectral clustering, do not induce a natural clustering on X based on a clustering of a sample. [sent-61, score-0.957]

40 In that case, we view the algorithm as a two-stage process, in which the clustering of the sample is extended to X through some uniform extension operator (such as assigning instances to the ’nearest’ cluster in some appropriate sense). [sent-62, score-0.584]

41 2 3 A Bayesian framework for relating stability and generalization The relationship between generalization and various notions of stability is long known, but has been dealt with mostly in a supervised learning setting (see [3][5] [8] and references therein). [sent-63, score-1.28]

42 In the context of unsupervised data clustering, several papers have explored the relevance of statistical stability and generalization, separately and together (such as [1][4][14][12]). [sent-64, score-0.57]

43 The aim of this section is to informally motivate our approach, of viewing stability and generalization in clustering as closely related. [sent-66, score-0.983]

44 Relating the two is very natural in a Bayesian setting, where clustering stability implies an ’unsurprising’ posterior given a prior, which is based on clustering another sample. [sent-67, score-1.182]

45 This distribution typically reflects the likelihood of a clustering hypothesis, given the data and prior assumptions. [sent-69, score-0.319]

46 The empirical risk of such a distribution, with respect to sample S , is defined as (A(S)|S ) = ECX ∼A(S) (CX |S ). [sent-71, score-0.169]

47 In this setting, consider for example the following simple procedure to derive a clustering hypothesis distribution, as well as a generalization bound: Given a sample of size 2m drawn i. [sent-72, score-0.575]

48 d from D, we randomly split it into two samples S1 ,S2 each of size m, and use A to cluster each of them separately. [sent-74, score-0.232]

49 This theorem implies that the more stable is the Bayesian algorithm, the tighter the expected generalization bounds we can achieve. [sent-83, score-0.196]

50 In general, the generalization bound might converge to 0 as m → ∞, but this is immaterial for the purpose of model selection. [sent-87, score-0.146]

51 2 3 4 Effective model selection for arbitrarily large sample sizes From now on, following [2], we will define the clustering distance function dD of Eq. [sent-93, score-0.52]

52 (2) In other words, the clustering distance is the probability that two independently drawn instances from D will be in the same cluster under one clustering, and in different clusters under another clustering. [sent-95, score-0.552]

53 In [2], it is essentially proven that if there exists a unique optimizer to the clustering algorithm’s objective function, to which the algorithm converges for asymptotically large samples, then stab(A, D, m) converges to 0 as m → ∞, regardless of the parameters of A. [sent-96, score-0.412]

54 From this, it was concluded that using stability as a tool for cluster validity is problematic, since for large enough samples it would always be approximately zero, for any algorithm parameters chosen. [sent-97, score-0.815]

55 However, using the intuition gleaned from the results of the previous section, the different convergence rates of the stability measure (for different algorithm parameters) should be more important than their absolute values or the sample size. [sent-98, score-0.796]

56 Then it holds that: 1 ˆ ˆ Pr(X ≤ Y ) ≤ exp − mE[X] 8 c 1+c 4 1 + exp − mE[X] 4 c 1+c 2 . [sent-109, score-0.166]

57 For example, suppose that according to our stability measure (see Eq. [sent-111, score-0.576]

58 (1)), a cluster model with k clusters is more stable than a model with k clusters, where k = k , for sample size m (e. [sent-112, score-0.432]

59 These stability measures might be arbitrarily close to zero. [sent-115, score-0.657]

60 However, we can estimate them by drawing another sample S3 of m instance pairs, and computing a sample mean to estimate Eq. [sent-120, score-0.224]

61 01), which are slower than Θ(1/m), then we can discern which number of clusters is more stable, with a high probability which actually improves as m increases. [sent-124, score-0.153]

62 2 as a guideline for when a stability estimator might be useful for arbitrarily large sample sizes. [sent-126, score-0.719]

63 We would expect these conditions to hold under quite general settings, since most stability measures are based on empirically estimating the mean of some random variable. [sent-128, score-0.583]

64 Using a relative entropy variant of Hoeffding’s bound [7], we have that for any 1 > b > 0 and 1/E[Y ] > a > 1, it holds that: ˆ Pr X ≤ bE[X] ≤ exp (−m DKL [bE [X] || E [X]]) , ˆ Pr Y ≥ aE[Y ] ≤ exp (−m DKL [aE [Y ] || E [Y ]]) . [sent-135, score-0.189]

65 4 By substituting the bound DKL [p||q] ≥ (p − q)2 /2 max{p, q} in the two inequalities, we get: 1 2 ˆ Pr X ≤ bE[X] ≤ exp − mE [X] (1 − b) 2 1 1 ˆ Pr Y ≥ aE[Y ] ≤ exp − mE [Y ] a + − 2 2 a (3) , (4) 2 which hold whenever 1 > b > 0 and a > 1. [sent-136, score-0.185]

66 As a proof of concept, we show that for a certain setting, the stability measure used by [2], as defined above, is meaningful for arbitrarily large sample sizes, even when this measure converges to zero for any choice of the required number of clusters. [sent-147, score-0.786]

67 The result is a simple counter-example to the claim that this phenomenon makes cluster stability a problematic tool. [sent-148, score-0.71]

68 The setting we analyze is a mixture distribution of three well-separated unequal Gaussians in R, where an empirical estimate of stability, using a centroid-based clustering algorithm, is utilized to discern whether the data contain 2, 3 or 4 clusters. [sent-149, score-0.481]

69 We prove that with high probability, this empirical estimation process will discern k = 3 as much more stable than both k = 2 and k = 4 (by an amount depending on the separation between the Gaussians). [sent-150, score-0.24]

70 The result is robust enough to hold even if in addition one performs normalization procedures to account for the fact that higher number of clusters entail more degrees of freedom for the clustering algorithm (see [9]). [sent-151, score-0.388]

71 The proof itself relies on a general and intuitive characteristic of what constitutes a ’wrong’ model (namely, having cluster boundaries in areas of high density), rather than any specific feature of this setting. [sent-153, score-0.143]

72 The next two lemmas, however, show that the stability measure for k = 3 (the ’correct’ model order) is smaller than the other two, by a substantial ratio independent of m, and that this will be discerned, with high probability, based on the empirical estimates of dD (Ak (S1 ), Ak (S2 )). [sent-156, score-0.631]

73 For some µ > 0, let D be a Gaussian mixture distribution on R, with density function (x + µ)2 2 p(x) = √ exp − 2 3 2π x2 1 + √ exp − 2 6 2π (x − µ)2 1 + √ exp − 2 6 2π . [sent-159, score-0.231]

74 Let Ak be a centroid-based clustering algorithm, which is given a sample and required number of clusters k, and returns a set of k centroids, minimizing the k-means objective function (sum of squared Euclidean distances between each instance and its nearest centroid). [sent-161, score-0.511]

75 With these lemmas, we can prove that a direct estimation of stab(A, D, m), based on a random sample, allows us to discern the more stable model with high probability, for arbitrarily large sample sizes. [sent-169, score-0.321]

76 For the setting described in Lemma 1, define the following unbiased estimator θk,4m of stab(Ak , D, m): Given a sample of size 4m, split it randomly into 3 disjoint subsets S1 ,S2 ,S3 of size m,m and 2m respectively. [sent-171, score-0.227]

77 (5) Denoting the event above as B, and assuming it does not occur, we have that the estimators ˆ ˆ ˆ θ2,4m , θ3,4m , θ4,4m are each an empirical average over an additional sample of size m, and the ˆ expected value of θ3,4m is at least twice smaller than the expected values of the other two. [sent-180, score-0.258]

78 5 Experiments In order to further substantiate our analysis above, some experiments were run on synthetic and real world data, with the goal of performing model selection over the number of clusters k. [sent-186, score-0.155]

79 We tested 3 different Gaussian mixture distributions (with µ = 5, 7, 8), and sample sizes m ranging from 25 to 222 . [sent-188, score-0.152]

80 Our results show that although these empirical estimators converge towards zero, their convergence rates differ, with √ approximately constant ratios between them. [sent-190, score-0.184]

81 Scaling the graphs by m results in approximately constant and differing stability measures for each µ. [sent-191, score-0.62]

82 Moreover, the failure rate does not increase with sample size, and decreases rapidly to negligible size as the Gaussians become more well separated - exactly in line with Thm. [sent-192, score-0.198]

83 For the other experiments, we used the stability-based cluster validation algorithm proposed in [9], which was found to compare favorably with similar algorithms, and has the desirable property of 6 ˆ ˆ ˆ Values of θ2 , θ3 , θ4 Distribution Failure Rate 0. [sent-195, score-0.197]

84 Random Sample Values of stability method index −1 5 10 0. [sent-221, score-0.544]

85 1 0 500 1000 5000 m Figure 2: Performance of stability based algorithm in [9] on 3 data sets. [sent-236, score-0.544]

86 In each row, the leftmost sub-figure is a sample representing the distribution, the middle sub-figure is a log-log plot of the computed stability indices (averaged over 100 trials), and on the right is the failure rate (in detecting the most stable model over repeated trials). [sent-237, score-0.789]

87 In the phoneme data set, the algorithm selects 3 clusters as the most stable models, since the vowels tend to group into a single cluster. [sent-238, score-0.165]

88 7 producing a clear quantitative stability measure, bounded in [0, 1]. [sent-240, score-0.574]

89 The advantage of this data set is its clear low-dimensional representation relative to its size, allowing us to get nearer to the asymptotic convergence rates of the stability measures. [sent-244, score-0.697]

90 All experiments used the k-means algorithm, except for the ring data set which used the spectral clustering algorithm proposed in [13]. [sent-245, score-0.319]

91 Complementing our theoretical analysis, the experiments clearly demonstrate that regardless of the actual stability measures per fixed sample size, they seem to eventually follow roughly constant and differing convergence rates, with no substantial degradation in performance. [sent-246, score-0.825]

92 In other words, when stability works well for small sample sizes, it should also work at least as well for larger sample sizes. [sent-247, score-0.746]

93 6 Conclusions In this paper, we propose a principled approach for analyzing the utility of stability for cluster validation in large finite samples. [sent-249, score-0.741]

94 This approach stems from viewing stability as a measure of generalization in a statistical setting. [sent-250, score-0.67]

95 It leads us to predict that in contrast to what might be concluded from previous work, cluster stability does not necessarily degrade with increasing sample size. [sent-251, score-0.905]

96 2) for when a stability measure might be relevant for arbitrarily large sample size, despite asymptotic universal stability. [sent-254, score-0.835]

97 They also suggest that by appropriate scaling, stability measures would become insensitive to the actual sample size used. [sent-255, score-0.757]

98 These guidelines do not presume a specific clustering framework. [sent-256, score-0.369]

99 However, we have proven their fulfillment rigorously only for a certain stability measure and clustering setting. [sent-257, score-0.919]

100 A framework for statistical clustering with a constant time approximation algorithms for k-median clustering. [sent-262, score-0.319]


similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('stability', 0.544), ('dd', 0.384), ('stab', 0.338), ('clustering', 0.319), ('ak', 0.292), ('cluster', 0.143), ('cx', 0.141), ('sample', 0.101), ('pr', 0.093), ('dkl', 0.084), ('discern', 0.084), ('cs', 0.078), ('generalization', 0.074), ('exp', 0.07), ('clusters', 0.069), ('stable', 0.068), ('clusterings', 0.067), ('concluded', 0.063), ('asymptotic', 0.055), ('validation', 0.054), ('size', 0.051), ('rates', 0.051), ('shai', 0.05), ('guidelines', 0.05), ('convergence', 0.047), ('arbitrarily', 0.047), ('xm', 0.046), ('failure', 0.046), ('gaussians', 0.045), ('ae', 0.044), ('joachim', 0.042), ('substantiate', 0.042), ('substantiated', 0.042), ('measures', 0.039), ('samples', 0.038), ('differing', 0.037), ('obeys', 0.037), ('ulrike', 0.037), ('trials', 0.036), ('regardless', 0.035), ('ln', 0.035), ('risk', 0.035), ('asymptotically', 0.034), ('separation', 0.034), ('andrea', 0.034), ('luxburg', 0.034), ('theorem', 0.033), ('empirical', 0.033), ('measure', 0.032), ('lemma', 0.032), ('estimators', 0.031), ('hypothesis', 0.03), ('bounded', 0.03), ('gure', 0.03), ('sizes', 0.03), ('meaningful', 0.03), ('leftmost', 0.03), ('cd', 0.03), ('inequalities', 0.029), ('universal', 0.029), ('phoneme', 0.028), ('hoeffding', 0.027), ('degrade', 0.027), ('validity', 0.027), ('mild', 0.027), ('might', 0.027), ('holds', 0.026), ('informally', 0.026), ('therein', 0.026), ('relevance', 0.026), ('von', 0.025), ('criterion', 0.025), ('proven', 0.024), ('setting', 0.024), ('problematic', 0.023), ('words', 0.023), ('selection', 0.023), ('lemmas', 0.023), ('growing', 0.023), ('bound', 0.023), ('actual', 0.022), ('converge', 0.022), ('ratio', 0.022), ('nite', 0.022), ('instance', 0.022), ('pascal', 0.022), ('substituting', 0.022), ('emphasize', 0.022), ('mixture', 0.021), ('instances', 0.021), ('intuition', 0.021), ('alexander', 0.021), ('clustered', 0.021), ('expected', 0.021), ('synthetic', 0.021), ('prove', 0.021), ('choices', 0.02), ('viewing', 0.02), ('relating', 0.02), ('merely', 0.02)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.99999958 46 nips-2007-Cluster Stability for Finite Samples

Author: Ohad Shamir, Naftali Tishby

Abstract: Over the past few years, the notion of stability in data clustering has received growing attention as a cluster validation criterion in a sample-based framework. However, recent work has shown that as the sample size increases, any clustering model will usually become asymptotically stable. This led to the conclusion that stability is lacking as a theoretical and practical tool. The discrepancy between this conclusion and the success of stability in practice has remained an open question, which we attempt to address. Our theoretical approach is that stability, as used by cluster validation algorithms, is similar in certain respects to measures of generalization in a model-selection framework. In such cases, the model chosen governs the convergence rate of generalization bounds. By arguing that these rates are more important than the sample size, we are led to the prediction that stability-based cluster validation algorithms should not degrade with increasing sample size, despite the asymptotic universal stability. This prediction is substantiated by a theoretical analysis as well as some empirical results. We conclude that stability remains a meaningful cluster validation criterion over finite samples. 1

2 0.18661831 184 nips-2007-Stability Bounds for Non-i.i.d. Processes

Author: Mehryar Mohri, Afshin Rostamizadeh

Abstract: The notion of algorithmic stability has been used effectively in the past to derive tight generalization bounds. A key advantage of these bounds is that they are designed for specific learning algorithms, exploiting their particular properties. But, as in much of learning theory, existing stability analyses and bounds apply only in the scenario where the samples are independently and identically distributed (i.i.d.). In many machine learning applications, however, this assumption does not hold. The observations received by the learning algorithm often have some inherent temporal dependence, which is clear in system diagnosis or time series prediction problems. This paper studies the scenario where the observations are drawn from a stationary mixing sequence, which implies a dependence between observations that weaken over time. It proves novel stability-based generalization bounds that hold even with this more general setting. These bounds strictly generalize the bounds given in the i.i.d. case. It also illustrates their application in the case of several general classes of learning algorithms, including Support Vector Regression and Kernel Ridge Regression.

3 0.16301252 65 nips-2007-DIFFRAC: a discriminative and flexible framework for clustering

Author: Francis R. Bach, Zaïd Harchaoui

Abstract: We present a novel linear clustering framework (D IFFRAC) which relies on a linear discriminative cost function and a convex relaxation of a combinatorial optimization problem. The large convex optimization problem is solved through a sequence of lower dimensional singular value decompositions. This framework has several attractive properties: (1) although apparently similar to K-means, it exhibits superior clustering performance than K-means, in particular in terms of robustness to noise. (2) It can be readily extended to non linear clustering if the discriminative cost function is based on positive definite kernels, and can then be seen as an alternative to spectral clustering. (3) Prior information on the partition is easily incorporated, leading to state-of-the-art performance for semi-supervised learning, for clustering or classification. We present empirical evaluations of our algorithms on synthetic and real medium-scale datasets.

4 0.15549903 80 nips-2007-Ensemble Clustering using Semidefinite Programming

Author: Vikas Singh, Lopamudra Mukherjee, Jiming Peng, Jinhui Xu

Abstract: We consider the ensemble clustering problem where the task is to ‘aggregate’ multiple clustering solutions into a single consolidated clustering that maximizes the shared information among given clustering solutions. We obtain several new results for this problem. First, we note that the notion of agreement under such circumstances can be better captured using an agreement measure based on a 2D string encoding rather than voting strategy based methods proposed in literature. Using this generalization, we first derive a nonlinear optimization model to maximize the new agreement measure. We then show that our optimization problem can be transformed into a strict 0-1 Semidefinite Program (SDP) via novel convexification techniques which can subsequently be relaxed to a polynomial time solvable SDP. Our experiments indicate improvements not only in terms of the proposed agreement measure but also the existing agreement measures based on voting strategies. We discuss evaluations on clustering and image segmentation databases. 1

5 0.1474113 61 nips-2007-Convex Clustering with Exemplar-Based Models

Author: Danial Lashkari, Polina Golland

Abstract: Clustering is often formulated as the maximum likelihood estimation of a mixture model that explains the data. The EM algorithm widely used to solve the resulting optimization problem is inherently a gradient-descent method and is sensitive to initialization. The resulting solution is a local optimum in the neighborhood of the initial guess. This sensitivity to initialization presents a significant challenge in clustering large data sets into many clusters. In this paper, we present a different approach to approximate mixture fitting for clustering. We introduce an exemplar-based likelihood function that approximates the exact likelihood. This formulation leads to a convex minimization problem and an efficient algorithm with guaranteed convergence to the globally optimal solution. The resulting clustering can be thought of as a probabilistic mapping of the data points to the set of exemplars that minimizes the average distance and the information-theoretic cost of mapping. We present experimental results illustrating the performance of our algorithm and its comparison with the conventional approach to mixture model clustering. 1

6 0.11871229 4 nips-2007-A Constraint Generation Approach to Learning Stable Linear Dynamical Systems

7 0.11859813 75 nips-2007-Efficient Bayesian Inference for Dynamically Changing Graphs

8 0.10740387 58 nips-2007-Consistent Minimization of Clustering Objective Functions

9 0.066524006 70 nips-2007-Discriminative K-means for Clustering

10 0.061958484 143 nips-2007-Object Recognition by Scene Alignment

11 0.061402693 146 nips-2007-On higher-order perceptron algorithms

12 0.059616327 13 nips-2007-A Unified Near-Optimal Estimator For Dimension Reduction in $l \alpha$ ($0<\alpha\leq 2$) Using Stable Random Projections

13 0.051416479 213 nips-2007-Variational Inference for Diffusion Processes

14 0.051137049 84 nips-2007-Expectation Maximization and Posterior Constraints

15 0.050587408 31 nips-2007-Bayesian Agglomerative Clustering with Coalescents

16 0.046551082 156 nips-2007-Predictive Matrix-Variate t Models

17 0.043574709 110 nips-2007-Learning Bounds for Domain Adaptation

18 0.04320921 192 nips-2007-Testing for Homogeneity with Kernel Fisher Discriminant Analysis

19 0.042225141 186 nips-2007-Statistical Analysis of Semi-Supervised Regression

20 0.041849863 68 nips-2007-Discovering Weakly-Interacting Factors in a Complex Stochastic Process


similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, -0.174), (1, 0.008), (2, -0.078), (3, 0.044), (4, 0.0), (5, -0.09), (6, 0.04), (7, 0.054), (8, -0.148), (9, -0.012), (10, -0.014), (11, 0.079), (12, 0.1), (13, -0.229), (14, -0.128), (15, 0.248), (16, 0.084), (17, -0.045), (18, -0.028), (19, 0.003), (20, 0.053), (21, 0.022), (22, -0.13), (23, -0.061), (24, 0.027), (25, 0.016), (26, 0.162), (27, -0.142), (28, 0.037), (29, -0.077), (30, 0.138), (31, -0.05), (32, -0.113), (33, 0.038), (34, 0.076), (35, -0.005), (36, 0.051), (37, -0.208), (38, -0.05), (39, -0.029), (40, 0.017), (41, 0.04), (42, 0.026), (43, -0.019), (44, -0.061), (45, 0.067), (46, 0.004), (47, -0.05), (48, 0.068), (49, 0.011)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.97116989 46 nips-2007-Cluster Stability for Finite Samples

Author: Ohad Shamir, Naftali Tishby

Abstract: Over the past few years, the notion of stability in data clustering has received growing attention as a cluster validation criterion in a sample-based framework. However, recent work has shown that as the sample size increases, any clustering model will usually become asymptotically stable. This led to the conclusion that stability is lacking as a theoretical and practical tool. The discrepancy between this conclusion and the success of stability in practice has remained an open question, which we attempt to address. Our theoretical approach is that stability, as used by cluster validation algorithms, is similar in certain respects to measures of generalization in a model-selection framework. In such cases, the model chosen governs the convergence rate of generalization bounds. By arguing that these rates are more important than the sample size, we are led to the prediction that stability-based cluster validation algorithms should not degrade with increasing sample size, despite the asymptotic universal stability. This prediction is substantiated by a theoretical analysis as well as some empirical results. We conclude that stability remains a meaningful cluster validation criterion over finite samples. 1

2 0.74203885 80 nips-2007-Ensemble Clustering using Semidefinite Programming

Author: Vikas Singh, Lopamudra Mukherjee, Jiming Peng, Jinhui Xu

Abstract: We consider the ensemble clustering problem where the task is to ‘aggregate’ multiple clustering solutions into a single consolidated clustering that maximizes the shared information among given clustering solutions. We obtain several new results for this problem. First, we note that the notion of agreement under such circumstances can be better captured using an agreement measure based on a 2D string encoding rather than voting strategy based methods proposed in literature. Using this generalization, we first derive a nonlinear optimization model to maximize the new agreement measure. We then show that our optimization problem can be transformed into a strict 0-1 Semidefinite Program (SDP) via novel convexification techniques which can subsequently be relaxed to a polynomial time solvable SDP. Our experiments indicate improvements not only in terms of the proposed agreement measure but also the existing agreement measures based on voting strategies. We discuss evaluations on clustering and image segmentation databases. 1

3 0.70767456 61 nips-2007-Convex Clustering with Exemplar-Based Models

Author: Danial Lashkari, Polina Golland

Abstract: Clustering is often formulated as the maximum likelihood estimation of a mixture model that explains the data. The EM algorithm widely used to solve the resulting optimization problem is inherently a gradient-descent method and is sensitive to initialization. The resulting solution is a local optimum in the neighborhood of the initial guess. This sensitivity to initialization presents a significant challenge in clustering large data sets into many clusters. In this paper, we present a different approach to approximate mixture fitting for clustering. We introduce an exemplar-based likelihood function that approximates the exact likelihood. This formulation leads to a convex minimization problem and an efficient algorithm with guaranteed convergence to the globally optimal solution. The resulting clustering can be thought of as a probabilistic mapping of the data points to the set of exemplars that minimizes the average distance and the information-theoretic cost of mapping. We present experimental results illustrating the performance of our algorithm and its comparison with the conventional approach to mixture model clustering. 1

4 0.66702789 184 nips-2007-Stability Bounds for Non-i.i.d. Processes

Author: Mehryar Mohri, Afshin Rostamizadeh

Abstract: The notion of algorithmic stability has been used effectively in the past to derive tight generalization bounds. A key advantage of these bounds is that they are designed for specific learning algorithms, exploiting their particular properties. But, as in much of learning theory, existing stability analyses and bounds apply only in the scenario where the samples are independently and identically distributed (i.i.d.). In many machine learning applications, however, this assumption does not hold. The observations received by the learning algorithm often have some inherent temporal dependence, which is clear in system diagnosis or time series prediction problems. This paper studies the scenario where the observations are drawn from a stationary mixing sequence, which implies a dependence between observations that weaken over time. It proves novel stability-based generalization bounds that hold even with this more general setting. These bounds strictly generalize the bounds given in the i.i.d. case. It also illustrates their application in the case of several general classes of learning algorithms, including Support Vector Regression and Kernel Ridge Regression.

5 0.53218329 65 nips-2007-DIFFRAC: a discriminative and flexible framework for clustering

Author: Francis R. Bach, Zaïd Harchaoui

Abstract: We present a novel linear clustering framework (D IFFRAC) which relies on a linear discriminative cost function and a convex relaxation of a combinatorial optimization problem. The large convex optimization problem is solved through a sequence of lower dimensional singular value decompositions. This framework has several attractive properties: (1) although apparently similar to K-means, it exhibits superior clustering performance than K-means, in particular in terms of robustness to noise. (2) It can be readily extended to non linear clustering if the discriminative cost function is based on positive definite kernels, and can then be seen as an alternative to spectral clustering. (3) Prior information on the partition is easily incorporated, leading to state-of-the-art performance for semi-supervised learning, for clustering or classification. We present empirical evaluations of our algorithms on synthetic and real medium-scale datasets.

6 0.4850688 4 nips-2007-A Constraint Generation Approach to Learning Stable Linear Dynamical Systems

7 0.46518961 58 nips-2007-Consistent Minimization of Clustering Objective Functions

8 0.45760494 70 nips-2007-Discriminative K-means for Clustering

9 0.41800335 75 nips-2007-Efficient Bayesian Inference for Dynamically Changing Graphs

10 0.37515867 159 nips-2007-Progressive mixture rules are deviation suboptimal

11 0.35925499 31 nips-2007-Bayesian Agglomerative Clustering with Coalescents

12 0.35668668 13 nips-2007-A Unified Near-Optimal Estimator For Dimension Reduction in $l \alpha$ ($0<\alpha\leq 2$) Using Stable Random Projections

13 0.30474773 38 nips-2007-Boosting Algorithms for Maximizing the Soft Margin

14 0.28576484 209 nips-2007-Ultrafast Monte Carlo for Statistical Summations

15 0.27985898 16 nips-2007-A learning framework for nearest neighbor search

16 0.278182 193 nips-2007-The Distribution Family of Similarity Distances

17 0.27706122 139 nips-2007-Nearest-Neighbor-Based Active Learning for Rare Category Detection

18 0.26953006 15 nips-2007-A general agnostic active learning algorithm

19 0.26854107 78 nips-2007-Efficient Principled Learning of Thin Junction Trees

20 0.2611011 211 nips-2007-Unsupervised Feature Selection for Accurate Recommendation of High-Dimensional Image Data


similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(5, 0.037), (13, 0.031), (16, 0.024), (19, 0.016), (21, 0.069), (26, 0.263), (31, 0.023), (34, 0.031), (35, 0.038), (47, 0.077), (49, 0.016), (83, 0.177), (85, 0.018), (87, 0.016), (90, 0.081)]

similar papers list:

simIndex simValue paperId paperTitle

1 0.88799441 143 nips-2007-Object Recognition by Scene Alignment

Author: Bryan Russell, Antonio Torralba, Ce Liu, Rob Fergus, William T. Freeman

Abstract: Current object recognition systems can only recognize a limited number of object categories; scaling up to many categories is the next challenge. We seek to build a system to recognize and localize many different object categories in complex scenes. We achieve this through a simple approach: by matching the input image, in an appropriate representation, to images in a large training set of labeled images. Due to regularities in object identities across similar scenes, the retrieved matches provide hypotheses for object identities and locations. We build a probabilistic model to transfer the labels from the retrieval set to the input image. We demonstrate the effectiveness of this approach and study algorithm component contributions using held-out test sets from the LabelMe database. 1

same-paper 2 0.80032939 46 nips-2007-Cluster Stability for Finite Samples

Author: Ohad Shamir, Naftali Tishby

Abstract: Over the past few years, the notion of stability in data clustering has received growing attention as a cluster validation criterion in a sample-based framework. However, recent work has shown that as the sample size increases, any clustering model will usually become asymptotically stable. This led to the conclusion that stability is lacking as a theoretical and practical tool. The discrepancy between this conclusion and the success of stability in practice has remained an open question, which we attempt to address. Our theoretical approach is that stability, as used by cluster validation algorithms, is similar in certain respects to measures of generalization in a model-selection framework. In such cases, the model chosen governs the convergence rate of generalization bounds. By arguing that these rates are more important than the sample size, we are led to the prediction that stability-based cluster validation algorithms should not degrade with increasing sample size, despite the asymptotic universal stability. This prediction is substantiated by a theoretical analysis as well as some empirical results. We conclude that stability remains a meaningful cluster validation criterion over finite samples. 1

3 0.77679849 43 nips-2007-Catching Change-points with Lasso

Author: Céline Levy-leduc, Zaïd Harchaoui

Abstract: We propose a new approach for dealing with the estimation of the location of change-points in one-dimensional piecewise constant signals observed in white noise. Our approach consists in reframing this task in a variable selection context. We use a penalized least-squares criterion with a 1 -type penalty for this purpose. We prove some theoretical results on the estimated change-points and on the underlying piecewise constant estimated function. Then, we explain how to implement this method in practice by combining the LAR algorithm and a reduced version of the dynamic programming algorithm and we apply it to synthetic and real data. 1

4 0.65071565 139 nips-2007-Nearest-Neighbor-Based Active Learning for Rare Category Detection

Author: Jingrui He, Jaime G. Carbonell

Abstract: Rare category detection is an open challenge for active learning, especially in the de-novo case (no labeled examples), but of significant practical importance for data mining - e.g. detecting new financial transaction fraud patterns, where normal legitimate transactions dominate. This paper develops a new method for detecting an instance of each minority class via an unsupervised local-density-differential sampling strategy. Essentially a variable-scale nearest neighbor process is used to optimize the probability of sampling tightly-grouped minority classes, subject to a local smoothness assumption of the majority class. Results on both synthetic and real data sets are very positive, detecting each minority class with only a fraction of the actively sampled points required by random sampling and by Pelleg’s Interleave method, the prior best technique in the sparse literature on this topic. 1

5 0.64865887 63 nips-2007-Convex Relaxations of Latent Variable Training

Author: Yuhong Guo, Dale Schuurmans

Abstract: We investigate a new, convex relaxation of an expectation-maximization (EM) variant that approximates a standard objective while eliminating local minima. First, a cautionary result is presented, showing that any convex relaxation of EM over hidden variables must give trivial results if any dependence on the missing values is retained. Although this appears to be a strong negative outcome, we then demonstrate how the problem can be bypassed by using equivalence relations instead of value assignments over hidden variables. In particular, we develop new algorithms for estimating exponential conditional models that only require equivalence relation information over the variable values. This reformulation leads to an exact expression for EM variants in a wide range of problems. We then develop a semidefinite relaxation that yields global training by eliminating local minima. 1

6 0.64813316 186 nips-2007-Statistical Analysis of Semi-Supervised Regression

7 0.64695066 56 nips-2007-Configuration Estimates Improve Pedestrian Finding

8 0.64683056 94 nips-2007-Gaussian Process Models for Link Analysis and Transfer Learning

9 0.64485329 128 nips-2007-Message Passing for Max-weight Independent Set

10 0.64474231 137 nips-2007-Multiple-Instance Pruning For Learning Efficient Cascade Detectors

11 0.64439714 156 nips-2007-Predictive Matrix-Variate t Models

12 0.64421129 45 nips-2007-Classification via Minimum Incremental Coding Length (MICL)

13 0.64316976 180 nips-2007-Sparse Feature Learning for Deep Belief Networks

14 0.64301467 187 nips-2007-Structured Learning with Approximate Inference

15 0.64299655 155 nips-2007-Predicting human gaze using low-level saliency combined with face detection

16 0.64031732 16 nips-2007-A learning framework for nearest neighbor search

17 0.64020967 66 nips-2007-Density Estimation under Independent Similarly Distributed Sampling Assumptions

18 0.63985854 209 nips-2007-Ultrafast Monte Carlo for Statistical Summations

19 0.63920963 76 nips-2007-Efficient Convex Relaxation for Transductive Support Vector Machine

20 0.6382938 6 nips-2007-A General Boosting Method and its Application to Learning Ranking Functions for Web Search