nips nips2006 nips2006-181 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Alexander Rakhlin, Andrea Caponnetto
Abstract: We phrase K-means clustering as an empirical risk minimization procedure over a class HK and explicitly calculate the covering number for this class. Next, we show that stability of K-means clustering is characterized by the geometry of HK with respect to the underlying distribution. We prove that in the case of a unique global minimizer, the clustering solution is stable with respect to complete changes of the data, while for the case of multiple minimizers, the change of Ω(n1/2 ) samples defines the transition between stability and instability. While for a finite number of minimizers this result follows from multinomial distribution estimates, the case of infinite minimizers requires more refined tools. We conclude by proving that stability of the functions in HK implies stability of the actual centers of the clusters. Since stability is often used for selecting the number of clusters in practice, we hope that our analysis serves as a starting point for finding theoretically grounded recipes for the choice of K. 1
Reference: text
sentIndex sentText sentNum sentScore
1 edu Abstract We phrase K-means clustering as an empirical risk minimization procedure over a class HK and explicitly calculate the covering number for this class. [sent-8, score-0.44]
2 Next, we show that stability of K-means clustering is characterized by the geometry of HK with respect to the underlying distribution. [sent-9, score-0.69]
3 We prove that in the case of a unique global minimizer, the clustering solution is stable with respect to complete changes of the data, while for the case of multiple minimizers, the change of Ω(n1/2 ) samples defines the transition between stability and instability. [sent-10, score-0.829]
4 While for a finite number of minimizers this result follows from multinomial distribution estimates, the case of infinite minimizers requires more refined tools. [sent-11, score-0.706]
5 We conclude by proving that stability of the functions in HK implies stability of the actual centers of the clusters. [sent-12, score-0.963]
6 Since stability is often used for selecting the number of clusters in practice, we hope that our analysis serves as a starting point for finding theoretically grounded recipes for the choice of K. [sent-13, score-0.5]
7 Part of the difficulty comes from the absence, in general, of an objective way to assess the clustering quality and to compare two groupings of the data. [sent-16, score-0.239]
8 In particular, attempts have been made by [4, 2, 3] to study and theoretically justify the stability-based approach of evaluating the quality of clustering solutions. [sent-18, score-0.214]
9 Building upon these ideas, we present a characterization of clustering stability in terms of the geometry of the function class associated with minimizing the objective function. [sent-19, score-0.706]
10 To simplify the exposition, we focus on K-means clustering, although the analogous results can be derived for K-medians and other clustering algorithms which minimize an objective function. [sent-20, score-0.239]
11 Let us first motivate the notion of clustering stability. [sent-21, score-0.186]
12 While for a fixed K, two clustering solutions can be compared according to the K-means objective function (see the next section), it is not meaningful to compare the value of the objective function for different K. [sent-22, score-0.312]
13 The approach stipulates that, for each K in some range, several clustering solutions should be computed by sub-sampling or perturbing the data. [sent-27, score-0.225]
14 The best value of K is that for which the clustering solutions are most “similar”. [sent-28, score-0.206]
15 For instance, Ben-Hur et al [5] randomly choose overlapping portions of the data and evaluate the distance between the resulting clustering solutions on the common samples. [sent-31, score-0.269]
16 Similarly, Ben-David et al [3, 2] study stability with respect to complete change of the data (independent draw). [sent-33, score-0.52]
17 These different approaches of choosing K prompted us to give a precise characterization of clustering stability with respect to both complete and partial changes of the data. [sent-34, score-0.738]
18 It has been noted by [6, 4, 3] that the stability of clustering with respect to complete change of the data is characterized by the uniqueness of the minimum of the objective function with respect to the true distribution. [sent-35, score-0.828]
19 Indeed, minimization of the K-means objective function can be phrased as an empirical risk minimization procedure (see [7]). [sent-36, score-0.318]
20 The stability follows, under some regularity assumptions, from the convergence of empirical and expected means over a Glivenko-Cantelli class of functions. [sent-37, score-0.439]
21 We prove stability in the case of a unique minimizer by explicitly computing the covering number in the next section and noting that the resulting class is VC-type. [sent-38, score-0.642]
22 We go further in our analysis by considering the other two interesting cases: finite and infinite number of minimizers of the objective function. [sent-39, score-0.388]
23 With the help of a stability result of [8, 9] for empirical risk minimization, we are able to prove that K-means clustering is stable with respect √ √ to changes of o( n) samples, where n is the total number of samples. [sent-40, score-0.819]
24 In fact, the rate of Ω( n) changes is a sharp transition between stability and instability in these cases. [sent-41, score-0.443]
25 The goal of clustering is to find a good partition based on the sample Z1 , . [sent-51, score-0.206]
26 (1) It is easy to verify that the (scaled) within-point scatter can be rewritten as W (C) = 1 n K k=1 i:C(Zi )=k Zi − ck 2 (2) where ck is the mean of the k-th cluster based on the assignment C (see Figure 1). [sent-59, score-0.228]
27 We are interested in the minimizers of the within-point scatter. [sent-60, score-0.335]
28 The K-means clustering algorithm is an alternating procedure minimizing the within-point scatter W (C). [sent-66, score-0.252]
29 The centers {ck }K are computed in the first step, following by the assignment of each Zi k=1 to its closest center ck ; the procedure is repeated. [sent-67, score-0.318]
30 In this paper, we are not concerned with the algorithmic issues of the minimization procedure. [sent-69, score-0.055]
31 Rather, we study stability properties of the minimizers of W (C). [sent-70, score-0.706]
32 The problem of minimizing W (C) can be phrased as empirical risk minimization [7] over the function class HK = {hA (z) = z − ai 2 , i = argmin z − aj j∈{1. [sent-71, score-0.366]
33 , aK } ∈ Z K }, We have scaled the within-point scatter by 1/n if compared to [10]. [sent-77, score-0.066]
34 (3) ck − Zi c1 2 c2 Figure 1: The clustering objective is to place the centers ck to minimize the sum of squared distances from points to their closest centers. [sent-78, score-0.638]
35 Functions hA (z) in HK can also be written as K hA (z) = i=1 z − ai 2 I(z is closest to ai ), where ties are broken, for instance, in the order of ai ’s. [sent-80, score-0.335]
36 Hence, functions hA ∈ HK are K parabolas glued together with centers at a1 , . [sent-81, score-0.221]
37 Hence, we will interchangeably use C and hC as minimizers of the within-point scatter. [sent-87, score-0.335]
38 One choice is to measure the similarity between the centers {ak }K and {bk }K of clusterings A and B. [sent-92, score-0.351]
39 3 Covering Number for HK The following technical Lemma shows that a covering of the ball B2 (0, R) induces a cover of HK in the L∞ distance because small shifts of the centers imply small changes of the corresponding functions in HK . [sent-95, score-0.441]
40 It is well-known that a Euclidean ball of radius R in Rm can be covered by N = 4R+δ δ balls of radius δ (see Lemma 2. [sent-100, score-0.104]
41 Consider an arbitrary function hA ∈ HK with centers at {a1 , . [sent-106, score-0.19]
42 We iterate through all the ai ’s, replacing them by the members of T . [sent-116, score-0.096]
43 After K steps, hA − hAK ∞ ≤ 4RKδ and all centers of AK belong to T . [sent-117, score-0.19]
44 Hence, each function hA ∈ H can be approximated to within 4RKδ by functions with centers in a finite set T . [sent-118, score-0.221]
45 The upper bound on the number of functions in mK functions cover HK to within 4RKδ in HK with centers in T is N K . [sent-119, score-0.279]
46 Geometry of HK and Stability 4 The above Lemma shows that HK is not too rich, as its covering numbers are polynomial. [sent-122, score-0.076]
47 This is the first important aspect in the study of clustering stability. [sent-123, score-0.186]
48 The second aspect is the geometry of HK with respect to the measure P . [sent-124, score-0.125]
49 In particular, stability of K-means clustering depends on the number of functions h ∈ HK with the minimum expectation Eh. [sent-125, score-0.588]
50 Note that the number of minimizers depends only on P and K, and not on the data. [sent-126, score-0.335]
51 Since Z is closed, the number of minimizers is at least one. [sent-127, score-0.335]
52 The three important cases are: unique minimum, a finite number of minimizers (greater than one), and an infinite number of minimizers. [sent-128, score-0.387]
53 In the case of a unique minimum of Eh, one can show that the diameter of Qε tends to zero as P → 0. [sent-133, score-0.079]
54 Hence, empirical averages of functions in HK uniformly converge to their expectations: n 1 lim P sup Eh − h(Zi ) > ε = 0. [sent-137, score-0.117]
55 → h ∈HK Assuming the existence of a unique minimizer, i. [sent-150, score-0.052]
56 Suppose the clustering A minimizes W (C) over the set {Z1 , . [sent-171, score-0.186]
57 → We have shown that in the case of a unique minimizer of the objective function (with respect to the distribution), two clusterings over independently drawn sets of points become arbitrarily close to each other with increasing probability as the number of points increases. [sent-179, score-0.385]
58 If there are finite (but greater than one) number of minimizers h ∈ HK of Eh, multinomial distri√ bution estimates tell us that we expect stability with respect to o( n) changes of points, while no √ stability is expected for Ω( n) changes, as the next example shows. [sent-180, score-1.212]
59 Consider 1-mean minimization over Z = {x1 , x2 }, x1 = x2 , and P = 2 (δx1 + δx2 ). [sent-182, score-0.055]
60 , Zn , the center of the minimizer of W (C) is either x1 or x2 , according to the majority vote over the training set. [sent-186, score-0.119]
61 , Zn , it is possible to swap the majority vote with constant probability. [sent-190, score-0.063]
62 Moreover, with probability approaching one, it is not possible to √ achieve the swap by a change of o( n) points. [sent-191, score-0.058]
63 The above example shows that, in general, it is not possible to prove closeness of clusterings over √ two sets of samples differing on Ω( n) elements. [sent-193, score-0.199]
64 Indeed, by employing the following Theorem, proven in [8, 9], we can show that even in the case of an infinite √ number of minimizers, clusterings over two sets of samples differing on o( n) elements become arbitrarily close with increasing probability as the number of samples increases. [sent-195, score-0.166]
65 This result cannot be deduced from the multinomial estimates, as it relies on the control of fluctuations of empirical means over a Donsker class. [sent-196, score-0.083]
66 Assume that the class of functions F over Z is uniformly bounded and P -Donsker, for some probability measure P over Z. [sent-200, score-0.073]
67 Let f (S) and f (T ) be minimizers over F of the empirical√ averages with respect to the sets S and T of n points i. [sent-201, score-0.405]
68 → We apply the above theorem to HK which is P -Donsker for any P because its covering numbers in L∞ scale polynomially (see Lemma 3. [sent-206, score-0.076]
69 We note that if the class HK were richer than P -Donsker, the stability result would not necessarily hold. [sent-209, score-0.392]
70 Suppose the clusterings A and B are minimizers of the K-means objective W (C) √ over the sets S and T , respectively. [sent-212, score-0.528]
71 → The above Corollary holds even if the number of minimizers h ∈ HK of Eh is infinite. [sent-215, score-0.335]
72 This concludes the analysis of stability of K-means for the three interesting cases: unique minimizer, finite number (greater than one) of minimizers, and infinite number of minimizers. [sent-216, score-0.423]
73 We have proved that stability of K-means clustering is characterized by the geometry of the class HK with respect to P . [sent-218, score-0.738]
74 It is evident that the choice of K maximizing stability of clustering aims to choose K for which there is a unique minimizer. [sent-219, score-0.609]
75 Unfortunately, √ “small” n, stability with respect for to a complete change of the data and stability with respect to o( n) changes are indistinguishable, making this rule of thumb questionable. [sent-220, score-0.997]
76 Moreover, as noted in [3], small changes of P lead to drastic changes in the number of minimizers. [sent-221, score-0.096]
77 5 Stability of the Centers Intuitively, stability of functions hA with respect to perturbation of the data Z1 , . [sent-222, score-0.453]
78 , Zn implies stability of the centers of the clusters. [sent-225, score-0.561]
79 Let us first define a notion of distance between centers of two clusterings. [sent-227, score-0.215]
80 , bK } are centers of two clusterings A and B, respectively. [sent-236, score-0.33]
81 Define a distance between these clusterings as dmax ({a1 , . [sent-237, score-0.253]
82 , bK }) := max min ( ai − bj + aj − bi ) 1≤i≤K 1≤j≤K Lemma 5. [sent-243, score-0.19]
83 Assume the density of P (with respect to the Lebesgue measure λ over Z) is bounded away from 0, i. [sent-245, score-0.072]
84 , bK }) ≤ 2 max max min 1≤i≤K 1≤j≤K ai − bj , max min 1≤i≤K 1≤j≤K aj − bi Without loss of generality, assume that the maximum on the right-hand side is attained at a1 and b1 such that b1 is the closest center to a1 out of {b1 , . [sent-263, score-0.257]
85 Consider B2 (a1 , d/2), a ball of radius d/2 centered at a1 . [sent-275, score-0.074]
86 Also note that for any z ∈ Z, / K z − a1 2 ≥ i=1 z − ai 2 I(ai is closest to z) = hA (z). [sent-282, score-0.143]
87 1 it is enough to show that the shaded area is upperbounded by the L1 (P ) distance between the functions hA and hB and lower-bounded by a power of d. [sent-284, score-0.075]
88 Assume the density of P (with respect to the Lebesgue measure λ over Z) is bounded away from 0, i. [sent-291, score-0.072]
89 Suppose the clusterings A and B are minimizers of√ K-means objective W (C) over the sets S and T , respectively. [sent-294, score-0.528]
90 → Hence, the√ centers of the minimizers of the within-point scatter are stable with respect to perturbations of o( n) points. [sent-303, score-0.67]
91 6 Conclusions We showed that K-means clustering can be phrased as empirical risk minimization over a class HK . [sent-306, score-0.417]
92 Furthermore, stability of clustering is determined by the geometry of HK with respect to P . [sent-307, score-0.661]
93 We proved that in the case of a unique minimizer, K-means is stable with respect to a complete √ change of the data, while for multiple minimizers, we still expect stability with respect to o( n) changes. [sent-308, score-0.64]
94 The rule for choosing K by maximizing stability can be viewed then as an attempt to select K such that HK has a unique minimizer with respect to P . [sent-309, score-0.563]
95 We hope that our analysis serves as a starting point for finding theoretically grounded recipes for choosing the number of clusters. [sent-311, score-0.129]
96 A framework for statistical clustering with a constant time approximation algorithms for k-median clustering. [sent-313, score-0.186]
97 A stability based method for discovering structure in clustered data. [sent-329, score-0.371]
98 Empirical risk approximation: An induction principle for unsupervised learning. [sent-340, score-0.055]
99 Some properties of empirical risk minimization over Donsker classes. [sent-345, score-0.157]
100 Stability properties of empirical risk minimization over Donsker classes. [sent-350, score-0.157]
wordName wordTfidf (topN-words)
[('hk', 0.47), ('stability', 0.371), ('minimizers', 0.335), ('ha', 0.328), ('centers', 0.19), ('clustering', 0.186), ('eh', 0.181), ('hb', 0.174), ('ak', 0.142), ('clusterings', 0.14), ('zn', 0.118), ('donsker', 0.111), ('dp', 0.107), ('bk', 0.101), ('zi', 0.101), ('ai', 0.096), ('hc', 0.093), ('minimizer', 0.089), ('dmax', 0.088), ('ck', 0.081), ('covering', 0.076), ('caponnetto', 0.067), ('eha', 0.067), ('scatter', 0.066), ('lemma', 0.064), ('minimization', 0.055), ('risk', 0.055), ('phrased', 0.053), ('objective', 0.053), ('geometry', 0.053), ('unique', 0.052), ('suppose', 0.051), ('respect', 0.051), ('changes', 0.048), ('closest', 0.047), ('empirical', 0.047), ('ehc', 0.045), ('grounded', 0.045), ('rakhlin', 0.045), ('thumb', 0.045), ('ulrike', 0.045), ('ball', 0.044), ('corollary', 0.043), ('shai', 0.041), ('aj', 0.039), ('lange', 0.039), ('al', 0.038), ('qp', 0.038), ('multinomial', 0.036), ('luxburg', 0.035), ('complete', 0.035), ('prove', 0.033), ('recipes', 0.033), ('swap', 0.033), ('bj', 0.032), ('inf', 0.031), ('functions', 0.031), ('radius', 0.03), ('rm', 0.03), ('nite', 0.03), ('zj', 0.03), ('vote', 0.03), ('lebesgue', 0.03), ('characterized', 0.029), ('theoretically', 0.028), ('stable', 0.028), ('cover', 0.027), ('diameter', 0.027), ('uniqueness', 0.027), ('proved', 0.027), ('differing', 0.026), ('distance', 0.025), ('euclidean', 0.025), ('precise', 0.025), ('change', 0.025), ('sharp', 0.024), ('von', 0.024), ('mk', 0.024), ('bi', 0.023), ('starting', 0.023), ('characterization', 0.022), ('assignments', 0.021), ('chicago', 0.021), ('class', 0.021), ('hence', 0.021), ('measure', 0.021), ('solutions', 0.02), ('sup', 0.02), ('proposition', 0.02), ('side', 0.02), ('partition', 0.02), ('averages', 0.019), ('colt', 0.019), ('argminh', 0.019), ('sober', 0.019), ('questionable', 0.019), ('joachim', 0.019), ('axiomatic', 0.019), ('stipulates', 0.019), ('upperbounded', 0.019)]
simIndex simValue paperId paperTitle
same-paper 1 1.0000004 181 nips-2006-Stability of $K$-Means Clustering
Author: Alexander Rakhlin, Andrea Caponnetto
Abstract: We phrase K-means clustering as an empirical risk minimization procedure over a class HK and explicitly calculate the covering number for this class. Next, we show that stability of K-means clustering is characterized by the geometry of HK with respect to the underlying distribution. We prove that in the case of a unique global minimizer, the clustering solution is stable with respect to complete changes of the data, while for the case of multiple minimizers, the change of Ω(n1/2 ) samples defines the transition between stability and instability. While for a finite number of minimizers this result follows from multinomial distribution estimates, the case of infinite minimizers requires more refined tools. We conclude by proving that stability of the functions in HK implies stability of the actual centers of the clusters. Since stability is often used for selecting the number of clusters in practice, we hope that our analysis serves as a starting point for finding theoretically grounded recipes for the choice of K. 1
2 0.1912407 84 nips-2006-Generalized Regularized Least-Squares Learning with Predefined Features in a Hilbert Space
Author: Wenye Li, Kin-hong Lee, Kwong-sak Leung
Abstract: Kernel-based regularized learning seeks a model in a hypothesis space by minimizing the empirical error and the model’s complexity. Based on the representer theorem, the solution consists of a linear combination of translates of a kernel. This paper investigates a generalized form of representer theorem for kernel-based learning. After mapping predefined features and translates of a kernel simultaneously onto a hypothesis space by a specific way of constructing kernels, we proposed a new algorithm by utilizing a generalized regularizer which leaves part of the space unregularized. Using a squared-loss function in calculating the empirical error, a simple convex solution is obtained which combines predefined features with translates of the kernel. Empirical evaluations have confirmed the effectiveness of the algorithm for supervised learning tasks.
3 0.15146883 116 nips-2006-Learning from Multiple Sources
Author: Koby Crammer, Michael Kearns, Jennifer Wortman
Abstract: We consider the problem of learning accurate models from multiple sources of “nearby” data. Given distinct samples from multiple data sources and estimates of the dissimilarities between these sources, we provide a general theory of which samples should be used to learn models for each source. This theory is applicable in a broad decision-theoretic learning framework, and yields results for classification and regression generally, and for density estimation within the exponential family. A key component of our approach is the development of approximate triangle inequalities for expected loss, which may be of independent interest. 1
4 0.13664719 83 nips-2006-Generalized Maximum Margin Clustering and Unsupervised Kernel Learning
Author: Hamed Valizadegan, Rong Jin
Abstract: Maximum margin clustering was proposed lately and has shown promising performance in recent studies [1, 2]. It extends the theory of support vector machine to unsupervised learning. Despite its good performance, there are three major problems with maximum margin clustering that question its efficiency for real-world applications. First, it is computationally expensive and difficult to scale to large-scale datasets because the number of parameters in maximum margin clustering is quadratic in the number of examples. Second, it requires data preprocessing to ensure that any clustering boundary will pass through the origins, which makes it unsuitable for clustering unbalanced dataset. Third, it is sensitive to the choice of kernel functions, and requires external procedure to determine the appropriate values for the parameters of kernel functions. In this paper, we propose “generalized maximum margin clustering” framework that addresses the above three problems simultaneously. The new framework generalizes the maximum margin clustering algorithm by allowing any clustering boundaries including those not passing through the origins. It significantly improves the computational efficiency by reducing the number of parameters. Furthermore, the new framework is able to automatically determine the appropriate kernel matrix without any labeled data. Finally, we show a formal connection between maximum margin clustering and spectral clustering. We demonstrate the efficiency of the generalized maximum margin clustering algorithm using both synthetic datasets and real datasets from the UCI repository. 1
5 0.096918702 80 nips-2006-Fundamental Limitations of Spectral Clustering
Author: Boaz Nadler, Meirav Galun
Abstract: Spectral clustering methods are common graph-based approaches to clustering of data. Spectral clustering algorithms typically start from local information encoded in a weighted graph on the data and cluster according to the global eigenvectors of the corresponding (normalized) similarity matrix. One contribution of this paper is to present fundamental limitations of this general local to global approach. We show that based only on local information, the normalized cut functional is not a suitable measure for the quality of clustering. Further, even with a suitable similarity measure, we show that the first few eigenvectors of such adjacency matrices cannot successfully cluster datasets that contain structures at different scales of size and density. Based on these findings, a second contribution of this paper is a novel diffusion based measure to evaluate the coherence of individual clusters. Our measure can be used in conjunction with any bottom-up graph-based clustering method, it is scale-free and can determine coherent clusters at all scales. We present both synthetic examples and real image segmentation problems where various spectral clustering algorithms fail. In contrast, using this coherence measure finds the expected clusters at all scales. Keywords: Clustering, kernels, learning theory. 1
6 0.091075741 7 nips-2006-A Local Learning Approach for Clustering
7 0.072608903 51 nips-2006-Clustering Under Prior Knowledge with Application to Image Segmentation
8 0.072411537 14 nips-2006-A Small World Threshold for Economic Network Formation
9 0.071924269 67 nips-2006-Differential Entropic Clustering of Multivariate Gaussians
10 0.070937589 117 nips-2006-Learning on Graph with Laplacian Regularization
11 0.068078443 151 nips-2006-On the Relation Between Low Density Separation, Spectral Clustering and Graph Cuts
12 0.066339687 19 nips-2006-Accelerated Variational Dirichlet Process Mixtures
13 0.065476209 11 nips-2006-A PAC-Bayes Risk Bound for General Loss Functions
14 0.055384517 104 nips-2006-Large-Scale Sparsified Manifold Regularization
15 0.053926699 198 nips-2006-Unified Inference for Variational Bayesian Linear Gaussian State-Space Models
16 0.053671446 109 nips-2006-Learnability and the doubling dimension
17 0.051405314 153 nips-2006-Online Clustering of Moving Hyperplanes
18 0.05046773 70 nips-2006-Doubly Stochastic Normalization for Spectral Clustering
19 0.049229749 30 nips-2006-An Oracle Inequality for Clipped Regularized Risk Minimizers
20 0.049090546 158 nips-2006-PG-means: learning the number of clusters in data
topicId topicWeight
[(0, -0.154), (1, 0.08), (2, -0.037), (3, 0.122), (4, 0.031), (5, 0.06), (6, 0.017), (7, 0.101), (8, 0.06), (9, 0.058), (10, 0.039), (11, 0.089), (12, -0.132), (13, 0.067), (14, -0.025), (15, 0.016), (16, 0.078), (17, 0.043), (18, 0.046), (19, -0.08), (20, -0.08), (21, 0.148), (22, 0.052), (23, 0.084), (24, -0.166), (25, -0.09), (26, 0.172), (27, -0.112), (28, -0.157), (29, 0.108), (30, -0.011), (31, -0.072), (32, 0.003), (33, -0.127), (34, 0.137), (35, -0.042), (36, -0.157), (37, -0.144), (38, -0.06), (39, -0.016), (40, -0.074), (41, 0.084), (42, 0.087), (43, 0.104), (44, -0.089), (45, 0.103), (46, 0.04), (47, -0.014), (48, 0.021), (49, 0.196)]
simIndex simValue paperId paperTitle
same-paper 1 0.96544498 181 nips-2006-Stability of $K$-Means Clustering
Author: Alexander Rakhlin, Andrea Caponnetto
Abstract: We phrase K-means clustering as an empirical risk minimization procedure over a class HK and explicitly calculate the covering number for this class. Next, we show that stability of K-means clustering is characterized by the geometry of HK with respect to the underlying distribution. We prove that in the case of a unique global minimizer, the clustering solution is stable with respect to complete changes of the data, while for the case of multiple minimizers, the change of Ω(n1/2 ) samples defines the transition between stability and instability. While for a finite number of minimizers this result follows from multinomial distribution estimates, the case of infinite minimizers requires more refined tools. We conclude by proving that stability of the functions in HK implies stability of the actual centers of the clusters. Since stability is often used for selecting the number of clusters in practice, we hope that our analysis serves as a starting point for finding theoretically grounded recipes for the choice of K. 1
2 0.52495372 116 nips-2006-Learning from Multiple Sources
Author: Koby Crammer, Michael Kearns, Jennifer Wortman
Abstract: We consider the problem of learning accurate models from multiple sources of “nearby” data. Given distinct samples from multiple data sources and estimates of the dissimilarities between these sources, we provide a general theory of which samples should be used to learn models for each source. This theory is applicable in a broad decision-theoretic learning framework, and yields results for classification and regression generally, and for density estimation within the exponential family. A key component of our approach is the development of approximate triangle inequalities for expected loss, which may be of independent interest. 1
3 0.51541853 84 nips-2006-Generalized Regularized Least-Squares Learning with Predefined Features in a Hilbert Space
Author: Wenye Li, Kin-hong Lee, Kwong-sak Leung
Abstract: Kernel-based regularized learning seeks a model in a hypothesis space by minimizing the empirical error and the model’s complexity. Based on the representer theorem, the solution consists of a linear combination of translates of a kernel. This paper investigates a generalized form of representer theorem for kernel-based learning. After mapping predefined features and translates of a kernel simultaneously onto a hypothesis space by a specific way of constructing kernels, we proposed a new algorithm by utilizing a generalized regularizer which leaves part of the space unregularized. Using a squared-loss function in calculating the empirical error, a simple convex solution is obtained which combines predefined features with translates of the kernel. Empirical evaluations have confirmed the effectiveness of the algorithm for supervised learning tasks.
4 0.40780491 7 nips-2006-A Local Learning Approach for Clustering
Author: Mingrui Wu, Bernhard Schölkopf
Abstract: We present a local learning approach for clustering. The basic idea is that a good clustering result should have the property that the cluster label of each data point can be well predicted based on its neighboring data and their cluster labels, using current supervised learning methods. An optimization problem is formulated such that its solution has the above property. Relaxation and eigen-decomposition are applied to solve this optimization problem. We also briefly investigate the parameter selection issue and provide a simple parameter selection method for the proposed algorithm. Experimental results are provided to validate the effectiveness of the proposed approach. 1
5 0.40316519 30 nips-2006-An Oracle Inequality for Clipped Regularized Risk Minimizers
Author: Ingo Steinwart, Don Hush, Clint Scovel
Abstract: We establish a general oracle inequality for clipped approximate minimizers of regularized empirical risks and apply this inequality to support vector machine (SVM) type algorithms. We then show that for SVMs using Gaussian RBF kernels for classification this oracle inequality leads to learning rates that are faster than the ones established in [9]. Finally, we use our oracle inequality to show that a simple parameter selection approach based on a validation set can yield the same fast learning rates without knowing the noise exponents which were required to be known a-priori in [9]. 1
6 0.3528235 117 nips-2006-Learning on Graph with Laplacian Regularization
7 0.34517244 19 nips-2006-Accelerated Variational Dirichlet Process Mixtures
8 0.33685499 153 nips-2006-Online Clustering of Moving Hyperplanes
9 0.33443218 83 nips-2006-Generalized Maximum Margin Clustering and Unsupervised Kernel Learning
10 0.3216449 194 nips-2006-Towards a general independent subspace analysis
11 0.30481303 67 nips-2006-Differential Entropic Clustering of Multivariate Gaussians
12 0.30120873 28 nips-2006-An Efficient Method for Gradient-Based Adaptation of Hyperparameters in SVM Models
13 0.28659028 109 nips-2006-Learnability and the doubling dimension
14 0.27501407 80 nips-2006-Fundamental Limitations of Spectral Clustering
15 0.27206308 51 nips-2006-Clustering Under Prior Knowledge with Application to Image Segmentation
16 0.26916486 21 nips-2006-AdaBoost is Consistent
17 0.26061648 190 nips-2006-The Neurodynamics of Belief Propagation on Binary Markov Random Fields
18 0.25931132 70 nips-2006-Doubly Stochastic Normalization for Spectral Clustering
19 0.25391027 23 nips-2006-Adaptor Grammars: A Framework for Specifying Compositional Nonparametric Bayesian Models
20 0.24857171 90 nips-2006-Hidden Markov Dirichlet Process: Modeling Genetic Recombination in Open Ancestral Space
topicId topicWeight
[(1, 0.07), (3, 0.027), (7, 0.085), (9, 0.074), (11, 0.333), (20, 0.017), (22, 0.05), (44, 0.094), (57, 0.05), (65, 0.041), (69, 0.024), (71, 0.021)]
simIndex simValue paperId paperTitle
1 0.86671323 81 nips-2006-Game Theoretic Algorithms for Protein-DNA binding
Author: Luis Pérez-breva, Luis E. Ortiz, Chen-hsiang Yeang, Tommi S. Jaakkola
Abstract: We develop and analyze game-theoretic algorithms for predicting coordinate binding of multiple DNA binding regulators. The allocation of proteins to local neighborhoods and to sites is carried out with resource constraints while explicating competing and coordinate binding relations among proteins with affinity to the site or region. The focus of this paper is on mathematical foundations of the approach. We also briefly demonstrate the approach in the context of the λ-phage switch. 1
same-paper 2 0.785694 181 nips-2006-Stability of $K$-Means Clustering
Author: Alexander Rakhlin, Andrea Caponnetto
Abstract: We phrase K-means clustering as an empirical risk minimization procedure over a class HK and explicitly calculate the covering number for this class. Next, we show that stability of K-means clustering is characterized by the geometry of HK with respect to the underlying distribution. We prove that in the case of a unique global minimizer, the clustering solution is stable with respect to complete changes of the data, while for the case of multiple minimizers, the change of Ω(n1/2 ) samples defines the transition between stability and instability. While for a finite number of minimizers this result follows from multinomial distribution estimates, the case of infinite minimizers requires more refined tools. We conclude by proving that stability of the functions in HK implies stability of the actual centers of the clusters. Since stability is often used for selecting the number of clusters in practice, we hope that our analysis serves as a starting point for finding theoretically grounded recipes for the choice of K. 1
3 0.45748958 80 nips-2006-Fundamental Limitations of Spectral Clustering
Author: Boaz Nadler, Meirav Galun
Abstract: Spectral clustering methods are common graph-based approaches to clustering of data. Spectral clustering algorithms typically start from local information encoded in a weighted graph on the data and cluster according to the global eigenvectors of the corresponding (normalized) similarity matrix. One contribution of this paper is to present fundamental limitations of this general local to global approach. We show that based only on local information, the normalized cut functional is not a suitable measure for the quality of clustering. Further, even with a suitable similarity measure, we show that the first few eigenvectors of such adjacency matrices cannot successfully cluster datasets that contain structures at different scales of size and density. Based on these findings, a second contribution of this paper is a novel diffusion based measure to evaluate the coherence of individual clusters. Our measure can be used in conjunction with any bottom-up graph-based clustering method, it is scale-free and can determine coherent clusters at all scales. We present both synthetic examples and real image segmentation problems where various spectral clustering algorithms fail. In contrast, using this coherence measure finds the expected clusters at all scales. Keywords: Clustering, kernels, learning theory. 1
Author: Gloria Haro, Gregory Randall, Guillermo Sapiro
Abstract: The study of point cloud data sampled from a stratification, a collection of manifolds with possible different dimensions, is pursued in this paper. We present a technique for simultaneously soft clustering and estimating the mixed dimensionality and density of such structures. The framework is based on a maximum likelihood estimation of a Poisson mixture model. The presentation of the approach is completed with artificial and real examples demonstrating the importance of extending manifold learning to stratification learning. 1
5 0.45383418 171 nips-2006-Sample Complexity of Policy Search with Known Dynamics
Author: Peter L. Bartlett, Ambuj Tewari
Abstract: We consider methods that try to find a good policy for a Markov decision process by choosing one from a given class. The policy is chosen based on its empirical performance in simulations. We are interested in conditions on the complexity of the policy class that ensure the success of such simulation based policy search methods. We show that under bounds on the amount of computation involved in computing policies, transition dynamics and rewards, uniform convergence of empirical estimates to true value functions occurs. Previously, such results were derived by assuming boundedness of pseudodimension and Lipschitz continuity. These assumptions and ours are both stronger than the usual combinatorial complexity measures. We show, via minimax inequalities, that this is essential: boundedness of pseudodimension or fat-shattering dimension alone is not sufficient.
6 0.45336759 65 nips-2006-Denoising and Dimension Reduction in Feature Space
7 0.45295048 117 nips-2006-Learning on Graph with Laplacian Regularization
8 0.45176449 87 nips-2006-Graph Laplacian Regularization for Large-Scale Semidefinite Programming
9 0.45160067 167 nips-2006-Recursive ICA
10 0.45071033 158 nips-2006-PG-means: learning the number of clusters in data
11 0.45054579 175 nips-2006-Simplifying Mixture Models through Function Approximation
12 0.44620791 109 nips-2006-Learnability and the doubling dimension
13 0.44596604 19 nips-2006-Accelerated Variational Dirichlet Process Mixtures
14 0.4457902 83 nips-2006-Generalized Maximum Margin Clustering and Unsupervised Kernel Learning
15 0.44413757 76 nips-2006-Emergence of conjunctive visual features by quadratic independent component analysis
16 0.44413599 3 nips-2006-A Complexity-Distortion Approach to Joint Pattern Alignment
17 0.44408932 32 nips-2006-Analysis of Empirical Bayesian Methods for Neuroelectromagnetic Source Localization
18 0.44365245 67 nips-2006-Differential Entropic Clustering of Multivariate Gaussians
19 0.44346014 21 nips-2006-AdaBoost is Consistent
20 0.443239 29 nips-2006-An Information Theoretic Framework for Eukaryotic Gradient Sensing