nips nips2005 nips2005-159 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Mukund Narasimhan, Nebojsa Jojic, Jeff A. Bilmes
Abstract: We show that Queyranne’s algorithm for minimizing symmetric submodular functions can be used for clustering with a variety of different objective functions. Two specific criteria that we consider in this paper are the single linkage and the minimum description length criteria. The first criterion tries to maximize the minimum distance between elements of different clusters, and is inherently “discriminative”. It is known that optimal clusterings into k clusters for any given k in polynomial time for this criterion can be computed. The second criterion seeks to minimize the description length of the clusters given a probabilistic generative model. We show that the optimal partitioning into 2 clusters, and approximate partitioning (guaranteed to be within a factor of 2 of the the optimal) for more clusters can be computed. To the best of our knowledge, this is the first time that a tractable algorithm for finding the optimal clustering with respect to the MDL criterion for 2 clusters has been given. Besides the optimality result for the MDL criterion, the chief contribution of this paper is to show that the same algorithm can be used to optimize a broad class of criteria, and hence can be used for many application specific criterion for which efficient algorithm are not known.
Reference: text
sentIndex sentText sentNum sentScore
1 com Abstract We show that Queyranne’s algorithm for minimizing symmetric submodular functions can be used for clustering with a variety of different objective functions. [sent-4, score-0.567]
2 Two specific criteria that we consider in this paper are the single linkage and the minimum description length criteria. [sent-5, score-0.192]
3 The first criterion tries to maximize the minimum distance between elements of different clusters, and is inherently “discriminative”. [sent-6, score-0.476]
4 It is known that optimal clusterings into k clusters for any given k in polynomial time for this criterion can be computed. [sent-7, score-0.812]
5 The second criterion seeks to minimize the description length of the clusters given a probabilistic generative model. [sent-8, score-0.716]
6 We show that the optimal partitioning into 2 clusters, and approximate partitioning (guaranteed to be within a factor of 2 of the the optimal) for more clusters can be computed. [sent-9, score-0.535]
7 To the best of our knowledge, this is the first time that a tractable algorithm for finding the optimal clustering with respect to the MDL criterion for 2 clusters has been given. [sent-10, score-0.824]
8 Besides the optimality result for the MDL criterion, the chief contribution of this paper is to show that the same algorithm can be used to optimize a broad class of criteria, and hence can be used for many application specific criterion for which efficient algorithm are not known. [sent-11, score-0.3]
9 1 Introduction The clustering of data is a problem found in many pattern recognition tasks, often in the guises of unsupervised learning, vector quantization, dimensionality reduction, etc. [sent-12, score-0.152]
10 Formally, the clustering problem can be described as follows. [sent-13, score-0.152]
11 Given a finite set S, and a criterion function Jk defined on all partitions of S into k parts, find a partition of S into k parts {S1 , S2 , . [sent-14, score-0.423]
12 There are distance based criteria, for which a distance measure is specified between each pair of elements, and the criterion somehow combines either intercluster or intracluster distances into an objective function. [sent-25, score-0.419]
13 The appropriate criterion is typically application dependent, and therefore, we do not claim that the two criteria considered in this paper are inherently better or more generally applicable than other criteria. [sent-28, score-0.316]
14 However, we can show that for the single-linkage criterion, we can compute the optimal clustering into k parts (for any k), and for the MDL criterion, we can compute the optimal clustering into 2 parts using Queyranne’s algorithm. [sent-29, score-0.586]
15 More generally, any criterion from a broad class of criterion can be solved by the same algorithm, and this class of criteria is closed under linear combinations. [sent-30, score-0.499]
16 The first, “discriminative”, criterion we consider is the single-linkage criterion. [sent-33, score-0.206]
17 In this case, we are given distances d(s1 , s2 ) between all elements s1 , s2 ∈ S, and we try and find clusters that maximize the minimum distance between elements of different clusters (i. [sent-34, score-1.101]
18 Since we are only comparing distances, the distance measure can be chosen from any ordered set (addition/squaring/multiplication of distances need not be defined as is required for K-means, spectral clustering etc. [sent-38, score-0.323]
19 Further, this criterion only depends on the rank ordering of the distances, and so is completely insensitive to any monotone transformation of the distances. [sent-40, score-0.3]
20 On the other hand, this criterion is sensitive to outliers and may not be appropriate when there are a large number of outliers in the data set. [sent-43, score-0.318]
21 The kernel based criterion considered in [3] is similar in spirit to this one. [sent-44, score-0.206]
22 However, their algorithm only provides approximate solutions, and the extension to more than 2 clusters is not given. [sent-45, score-0.384]
23 However, since they optimize the distance of the clusters to a hyperplane, it is more appropriate if the clusters are to be classified using a SVM. [sent-46, score-0.812]
24 The second criterion we consider is “generative” in nature and is based on the Minimum Description Length principle. [sent-47, score-0.206]
25 In this case we are given a (generative) probability model for the elements, and we attempt to find clusters so that describing or encoding the clusters (separately) can be done using as few bits as possible. [sent-48, score-0.706]
26 This is also a very natural criterion grouping together data items that can be highly compressed translates to grouping elements that share common characteristics. [sent-49, score-0.307]
27 This criterion has also been widely used in the past, though the algorithms given do not guarantee optimal solutions (even for 2 clusters). [sent-50, score-0.288]
28 Since these criteria seem quite different in nature, it is surprising that the same algorithm can be used to find the optimal partitions into two clusters in both cases. [sent-51, score-0.638]
29 2 Background and Notation A clustering of a finite set S is a partition {S1 , S2 , . [sent-54, score-0.247]
30 We will call the individual elements of the partition the clusters of the partition. [sent-58, score-0.526]
31 If there are k clusters in the partition, then we say that the partition is a k-clustering. [sent-59, score-0.471]
32 The distance between sets T and R is often defined to be the smallest distance between elements from these different clusters: D(R, T ) = minr∈R,t∈T d(r, t). [sent-65, score-0.273]
33 The single-linkage criterion tries to maximize this distance, and hence an optimal 2-clustering is in arg max{S1 ,S2 }∈C2 (S) D(S1 , S2 ). [sent-66, score-0.408]
34 It is known that an algorithm based on the Minimum Spanning Tree can be used to find optimal clusterings for the single-linkage criterion[8]. [sent-68, score-0.238]
35 So a partition {S1 , S2 } of S that minimizes the description length (DL) is in arg min DL(S1 , S2 ) = {S1 ,S2 }∈C2 (S) arg min H(S1 ) + H(S2 ) {S1 ,S2 }∈C2 (S) We will denote by 2S the set of all subsets of S. [sent-77, score-0.523]
36 We say that f is submodular if f (A) + f (B) ≥ f (A ∪ B) + f (A ∩ B) for every A, B ⊆ S. [sent-79, score-0.298]
37 In [1], Queyranne gives a polynomial time algorithm that finds a set A ∈ 2S \ {S, φ} that minimizes any symmetric submodular set function (specified in the form of an oracle). [sent-81, score-0.478]
38 That is, Queyranne’s algorithm finds a non-trivial partition {S1 , S \ S1 } of S so that f (S1 ) (= f (S \ S1 )) minimizes f over all non-trivial subsets of S. [sent-82, score-0.211]
39 The problem of finding non-trivial minimizers of a symmetric submodular function can be thought of a a generalization of the graph-cut problem. [sent-83, score-0.358]
40 In Section 3, we show that Queyranne’s algorithm can be used to find the optimal k-clustering (for any k) in polynomial time for the single-linkage criterion. [sent-86, score-0.159]
41 In Section 4, we give an algorithm for finding the optimal clustering into 2 parts that minimizes the description length. [sent-87, score-0.415]
42 3 Single-Linkage: Maximizing the separation between clusters In this section, we show that Queyranne’s algorithm can be used for finding k-clusters (for any given k) that maximize the separation between elements of different clusters. [sent-89, score-0.629]
43 1, we show that Queyranne’s algorithm can partition the set S into two parts to maximize the distance between these parts in polynomial time. [sent-92, score-0.413]
44 To see this, observe that D(U, T ) = min d(u, t) = min u∈U,t∈T min d(u, r), u∈U,r∈R min u∈U,t∈T \R d(u, t) ≤ D(U, R) Lemma 2. [sent-101, score-0.328]
45 To see this first observe that D(A, B ∪ W ) = min(D(A, B), D(A, W )) because D(A, W ∪ B) = min a∈A,x∈W ∪B D(a, x) = min min a∈A,w∈W D(a, w), min D(A, b) a∈A,b∈B It follows that D(A, B ∪ W ) = min (D(A, B), D(A, W )) ≤ min (D(A, B), D(B, W )) = min (D(B, A), D(B, W )) = D(B, A ∪ W ). [sent-104, score-0.574]
46 The function D(R, T ) can be thought of as defining the separation or margin between the clusters R and T . [sent-113, score-0.451]
47 We can generalize this notion to more than two clusters as follows. [sent-114, score-0.353]
48 , Sk }) = min D(Si , Sj ) = i=j min Si =Sj d(si , sj ) si ∈Si ,sj ∈Sj Note that seperation({R, T }) = D(R, T ) for a 2-clustering. [sent-118, score-0.253]
49 The function seperation : |S| ∪k=1 Ck (S) → R takes a single clustering as its argument. [sent-119, score-0.664]
50 However, D(·, ·) takes two disjoint subsets of S as its arguments the union of which need not be S in general. [sent-120, score-0.152]
51 The margin is the distance between the closest elements of different clusters, and hence we will be interested in finding k-clusters that maximize the margin. [sent-121, score-0.24]
52 An obvious approach to generating optimal k-clusterings given a method of generating optimal 2-clusterings is the following. [sent-131, score-0.193]
53 First, it is not clear that an optimal k-clustering can be a refinement of an optimal 2-clustering. [sent-135, score-0.164]
54 That is, we need to be sure that there is an optimal k-clustering in which S1 is the union of some of the clusters, and S2 is the union of the remaining. [sent-136, score-0.172]
55 Second, we need to figure out how many of the clusters S1 is the union of and how many S2 is the union of. [sent-137, score-0.443]
56 In this section, we will show that for any k ≥ 3, there is always an optimal k-clustering that is a refinement of any given optimal 2-clustering. [sent-138, score-0.164]
57 We begin by establishing some relationships between the separation of clusterings of different sizes. [sent-140, score-0.184]
58 To compare the separation of clusterings with different number of clusters, we can try and merge two of the clusters from the clustering with more clusters. [sent-141, score-0.757]
59 , Sk } ∈ Ck (S) is any k-clustering of S, and S ′ is a (k − 1)-clustering of S obtained by merging two of the clusters (say S1 and S2 ). [sent-145, score-0.382]
60 , Tl } ∈ Cl (S) of S, we can create a new clustering U = {U1 , U2 , . [sent-182, score-0.152]
61 That is, the clusters of U consist of those elements that are in the same clusters of both S and T . [sent-186, score-0.784]
62 To show equality, note that if a, b are in different clusters of U, then a, b must have been in different clusters of either S or T . [sent-202, score-0.706]
63 This result can be thought of as expressing a relationship between seperation and the lattice of partitions of S which will be important to our later robustness extension Lemma 6. [sent-203, score-0.575]
64 , Tk } ∈ Ok (S) is an optimal k-clustering, let r be the number of clusters of T that “do not respect” the partition {S1 , S2 }. [sent-211, score-0.53]
65 That is, r is the number of clusters of T that intersect both S1 and S2 : r = |{ 1 ≤ i ≤ k : Ti ∩ S1 = φ and Ti ∩ S2 = φ}|. [sent-212, score-0.353]
66 , Tk ∈ Ck+1 (S) is a refinement of T and satisfies ′ seperation(T ) = seperation (T ). [sent-220, score-0.512]
67 Now, pick two clusters of T ′ that are either both contained in the same cluster of S or both “do not respect” S. [sent-222, score-0.387]
68 Merge these clusters together to get an element T ′′ ∈ Ck (S). [sent-224, score-0.375]
69 By Lemma 3 merging clusters cannot decrease the margin. [sent-225, score-0.382]
70 However, T ′′ has fewer clusters that do not respect S hand T has, and hence we have a contradiction. [sent-227, score-0.353]
71 This lemma implies that Queyranne’s algorithm, along with a simple dynamic program3 ming algorithm can be used to find the best k clustering with time complexity O(k |S| ). [sent-228, score-0.259]
72 Even though using Queyranne’s algorithm is not the fastest algorithm for this problem, the fact that it optimizes this criterion implies that it can be used to optimize conic combinations of submodular criteria and the single-linkage criterion. [sent-230, score-0.662]
73 3 Generating robust clusterings One possible issue with the metric we defined is that it is very sensitive to outliers and noise. [sent-232, score-0.237]
74 To see this, note that if we have two very well separated clusters, then adding a few points “between” the clusters could dramatically decrease the separation. [sent-233, score-0.353]
75 To increase the robustness of the algorithm, we can try to maximize the n smallest distances instead of maximizing just the smallest distance between clusters. [sent-234, score-0.309]
76 If we give the nth smallest distance more importance than the smallest distance, this increases the noise tolerance by ignoring the effects of a few outliers. [sent-235, score-0.2]
77 , d|R|·|T | be an ordered list of distances between elements of R and T arranged in decreasing order. [sent-243, score-0.197]
78 Otherwise, if |R| · |T | < n, then the first n − |R|· |T | elements of D(R, T ) are ∞, while the remaining elements are the elements of L(R, T ). [sent-247, score-0.234]
79 In practice we noticed that restricting our search to only edges whose deletion results in clusters of at least certain sizes produces very good results. [sent-289, score-0.386]
80 The expected coding (or description) length of any collection T of random variables using an optimal coding scheme (or a random coding scheme) is known to be H(T ). [sent-294, score-0.231]
81 The partition {S1 , S2 } of S that minimizes the coding length is therefore arg min{S1 ,S2 }∈C2 (S) H(S1 ) + H(S2 ). [sent-295, score-0.25]
82 Clearly the minima of this function correspond to partitions that minimize the mutual information between the parts. [sent-298, score-0.164]
83 Therefore, the problem of partitioning in order to minimize the mutual information between the parts can be reduced to a symmetric submodular minimization problem, which can be solved using Queyranne’s algorithm in time O(|S|3 ) assuming oracle queries to a mutual information oracle. [sent-299, score-0.7]
84 Symmetric submodular functions generalize notions like graph-cuts, and indeed, Queyranne’s algorithm generalizes an algorithm for computing graph-cuts. [sent-301, score-0.337]
85 Since graph-cut based techniques are extensively used in many engineering applications, it might be possible to develop criteria that are more appropriate for these specific applications, while still retaining producing optimal partitions of size 2. [sent-302, score-0.256]
86 It should be noted that, in general, we cannot use the dynamic programming algorithm to produce optimal clusterings with k > 2 clusters for the MDL criterion (or for general symmetric submodular functions). [sent-303, score-1.155]
87 Another approach (which is computationally cheaper) is to compute k clusters by deleting k − 1 edges of the Gomory-Hu tree produced by Queyranne’s algorithm. [sent-306, score-0.386]
88 More generally, if we have an arbitrary increasing submodular function (such as entropy) f : 2S → R, and we seek a clustering {S1 , S2 , . [sent-308, score-0.427]
89 Therefore, this generalizes approximation guarantees for graph k-cuts because for any graph G = (V, E), the function f : 2V → R where f (A) is the number of edges adjacent to the vertex set A is a submodular function. [sent-312, score-0.331]
90 The k finding a clustering to minimize i=1 f (Si ) is equivalent to finding a partition of the vertex set of size k to minimize the number of edges disconnected (i. [sent-313, score-0.362]
91 Another criterion which we can define similarly can be applied to clustering genomic sequences. [sent-316, score-0.358]
92 Therefore, a natural clustering criterion for sequences is to partition the sequences into clusters so that the sequences from different clusters share as few subsequences as possible. [sent-318, score-1.182]
93 The left part of the table shows the error rates (in percentages) of the (robust) single-linkage criterion and some other techniques on the same data set as is reported in [3]. [sent-321, score-0.206]
94 The data sets are images (of digits and faces), and the distance function we used was the Euclidean distance between the vector of the pixels in the images. [sent-322, score-0.148]
95 The right part of the table compares the Q-Clustering using MDL criterion with other state of the art algorithms for haplotype tagging of SNPs (single nucleotide polymorphisms) in the ACE gene on the data set reported in [4]. [sent-323, score-0.316]
96 The MDL criterion is also a very natural one, and the results on haplotype tagging are quite promising. [sent-338, score-0.338]
97 The MDL criterion can be seen as a generalization of graph cuts, and so it seems like Q-clustering can also be applied to optimize other criteria arising in problems like image segmentation, especially when there is a generative model. [sent-339, score-0.36]
98 Another natural criterion for clustering strings is to partition the strings/sequences to minimize the number of common subsequences. [sent-340, score-0.494]
99 The key novelty of this paper is the guarantees of optimality produced by the algorithm, and the generaly framework into which a number of natural criterion fall. [sent-342, score-0.229]
100 Brucker, “On the complexity of clustering problems,” in R. [sent-377, score-0.152]
wordName wordTfidf (topN-words)
[('seperation', 0.512), ('clusters', 0.353), ('queyranne', 0.315), ('submodular', 0.275), ('mdl', 0.222), ('criterion', 0.206), ('snps', 0.157), ('clustering', 0.152), ('ck', 0.141), ('clusterings', 0.125), ('sk', 0.116), ('partition', 0.095), ('monotone', 0.094), ('criteria', 0.087), ('symmetric', 0.083), ('min', 0.082), ('optimal', 0.082), ('rizzi', 0.079), ('elements', 0.078), ('lemma', 0.076), ('distance', 0.074), ('ok', 0.073), ('nement', 0.072), ('disjoint', 0.065), ('distances', 0.065), ('partitions', 0.063), ('mutual', 0.06), ('vk', 0.06), ('haplotype', 0.059), ('separation', 0.059), ('parts', 0.059), ('tl', 0.058), ('si', 0.056), ('outliers', 0.056), ('rn', 0.052), ('tagging', 0.051), ('tk', 0.05), ('partitioning', 0.05), ('arg', 0.049), ('maximize', 0.049), ('description', 0.048), ('smallest', 0.047), ('jojic', 0.047), ('polynomial', 0.046), ('union', 0.045), ('cl', 0.044), ('minimizes', 0.043), ('subsets', 0.042), ('oracle', 0.041), ('merge', 0.041), ('minimize', 0.041), ('htstep', 0.039), ('margin', 0.039), ('operation', 0.038), ('generative', 0.035), ('commutative', 0.034), ('ace', 0.034), ('minr', 0.034), ('narasimhan', 0.034), ('cluster', 0.034), ('sj', 0.033), ('edges', 0.033), ('length', 0.033), ('tolerance', 0.032), ('ordered', 0.032), ('optimize', 0.032), ('algorithm', 0.031), ('robust', 0.031), ('nding', 0.031), ('coding', 0.03), ('dl', 0.029), ('merging', 0.029), ('sm', 0.029), ('obvious', 0.029), ('tj', 0.028), ('re', 0.028), ('corporation', 0.027), ('vl', 0.027), ('microsoft', 0.027), ('try', 0.027), ('collection', 0.026), ('minimizing', 0.026), ('ti', 0.025), ('um', 0.025), ('metric', 0.025), ('minimum', 0.024), ('xu', 0.024), ('retaining', 0.024), ('heuristics', 0.023), ('inherently', 0.023), ('say', 0.023), ('nds', 0.023), ('guarantees', 0.023), ('share', 0.023), ('tries', 0.022), ('discriminative', 0.022), ('quite', 0.022), ('element', 0.022), ('decreasing', 0.022), ('cm', 0.022)]
simIndex simValue paperId paperTitle
same-paper 1 0.9999997 159 nips-2005-Q-Clustering
Author: Mukund Narasimhan, Nebojsa Jojic, Jeff A. Bilmes
Abstract: We show that Queyranne’s algorithm for minimizing symmetric submodular functions can be used for clustering with a variety of different objective functions. Two specific criteria that we consider in this paper are the single linkage and the minimum description length criteria. The first criterion tries to maximize the minimum distance between elements of different clusters, and is inherently “discriminative”. It is known that optimal clusterings into k clusters for any given k in polynomial time for this criterion can be computed. The second criterion seeks to minimize the description length of the clusters given a probabilistic generative model. We show that the optimal partitioning into 2 clusters, and approximate partitioning (guaranteed to be within a factor of 2 of the the optimal) for more clusters can be computed. To the best of our knowledge, this is the first time that a tractable algorithm for finding the optimal clustering with respect to the MDL criterion for 2 clusters has been given. Besides the optimality result for the MDL criterion, the chief contribution of this paper is to show that the same algorithm can be used to optimize a broad class of criteria, and hence can be used for many application specific criterion for which efficient algorithm are not known.
2 0.15218173 178 nips-2005-Soft Clustering on Graphs
Author: Kai Yu, Shipeng Yu, Volker Tresp
Abstract: We propose a simple clustering framework on graphs encoding pairwise data similarities. Unlike usual similarity-based methods, the approach softly assigns data to clusters in a probabilistic way. More importantly, a hierarchical clustering is naturally derived in this framework to gradually merge lower-level clusters into higher-level ones. A random walk analysis indicates that the algorithm exposes clustering structures in various resolutions, i.e., a higher level statistically models a longer-term diffusion on graphs and thus discovers a more global clustering structure. Finally we provide very encouraging experimental results. 1
3 0.12805723 13 nips-2005-A Probabilistic Approach for Optimizing Spectral Clustering
Author: Rong Jin, Feng Kang, Chris H. Ding
Abstract: Spectral clustering enjoys its success in both data clustering and semisupervised learning. But, most spectral clustering algorithms cannot handle multi-class clustering problems directly. Additional strategies are needed to extend spectral clustering algorithms to multi-class clustering problems. Furthermore, most spectral clustering algorithms employ hard cluster membership, which is likely to be trapped by the local optimum. In this paper, we present a new spectral clustering algorithm, named “Soft Cut”. It improves the normalized cut algorithm by introducing soft membership, and can be efficiently computed using a bound optimization algorithm. Our experiments with a variety of datasets have shown the promising performance of the proposed clustering algorithm. 1
4 0.12470331 102 nips-2005-Kernelized Infomax Clustering
Author: David Barber, Felix V. Agakov
Abstract: We propose a simple information-theoretic approach to soft clustering based on maximizing the mutual information I(x, y) between the unknown cluster labels y and the training patterns x with respect to parameters of specifically constrained encoding distributions. The constraints are chosen such that patterns are likely to be clustered similarly if they lie close to specific unknown vectors in the feature space. The method may be conveniently applied to learning the optimal affinity matrix, which corresponds to learning parameters of the kernelized encoder. The procedure does not require computations of eigenvalues of the Gram matrices, which makes it potentially attractive for clustering large data sets. 1
5 0.1163355 177 nips-2005-Size Regularized Cut for Data Clustering
Author: Yixin Chen, Ya Zhang, Xiang Ji
Abstract: We present a novel spectral clustering method that enables users to incorporate prior knowledge of the size of clusters into the clustering process. The cost function, which is named size regularized cut (SRcut), is defined as the sum of the inter-cluster similarity and a regularization term measuring the relative size of two clusters. Finding a partition of the data set to minimize SRcut is proved to be NP-complete. An approximation algorithm is proposed to solve a relaxed version of the optimization problem as an eigenvalue problem. Evaluations over different data sets demonstrate that the method is not sensitive to outliers and performs better than normalized cut. 1
6 0.082982704 84 nips-2005-Generalization in Clustering with Unobserved Features
7 0.080384001 79 nips-2005-Fusion of Similarity Data in Clustering
8 0.073770754 56 nips-2005-Diffusion Maps, Spectral Clustering and Eigenfunctions of Fokker-Planck Operators
9 0.068005353 113 nips-2005-Learning Multiple Related Tasks using Latent Independent Component Analysis
10 0.067419067 186 nips-2005-TD(0) Leads to Better Policies than Approximate Value Iteration
11 0.062914088 10 nips-2005-A General and Efficient Multiple Kernel Learning Algorithm
12 0.057017848 154 nips-2005-Preconditioner Approximations for Probabilistic Graphical Models
13 0.056617565 57 nips-2005-Distance Metric Learning for Large Margin Nearest Neighbor Classification
14 0.055391483 127 nips-2005-Mixture Modeling by Affinity Propagation
15 0.054801665 126 nips-2005-Metric Learning by Collapsing Classes
16 0.054777268 42 nips-2005-Combining Graph Laplacians for Semi--Supervised Learning
17 0.049019154 9 nips-2005-A Domain Decomposition Method for Fast Manifold Learning
18 0.048836254 15 nips-2005-A Theoretical Analysis of Robust Coding over Noisy Overcomplete Channels
19 0.047326323 28 nips-2005-Analyzing Auditory Neurons by Learning Distance Functions
20 0.046997603 27 nips-2005-Analysis of Spectral Kernel Design based Semi-supervised Learning
topicId topicWeight
[(0, 0.167), (1, 0.07), (2, -0.055), (3, -0.055), (4, -0.205), (5, -0.031), (6, -0.01), (7, 0.006), (8, -0.147), (9, -0.013), (10, -0.109), (11, -0.03), (12, 0.049), (13, 0.013), (14, 0.102), (15, 0.071), (16, 0.05), (17, -0.042), (18, -0.084), (19, -0.029), (20, 0.006), (21, -0.06), (22, 0.009), (23, 0.008), (24, -0.017), (25, 0.025), (26, -0.078), (27, -0.095), (28, 0.064), (29, -0.042), (30, -0.003), (31, 0.049), (32, 0.119), (33, -0.003), (34, 0.035), (35, 0.01), (36, 0.021), (37, -0.072), (38, -0.012), (39, 0.088), (40, 0.076), (41, -0.038), (42, 0.039), (43, -0.016), (44, 0.04), (45, 0.041), (46, -0.07), (47, -0.015), (48, 0.068), (49, -0.047)]
simIndex simValue paperId paperTitle
same-paper 1 0.96059769 159 nips-2005-Q-Clustering
Author: Mukund Narasimhan, Nebojsa Jojic, Jeff A. Bilmes
Abstract: We show that Queyranne’s algorithm for minimizing symmetric submodular functions can be used for clustering with a variety of different objective functions. Two specific criteria that we consider in this paper are the single linkage and the minimum description length criteria. The first criterion tries to maximize the minimum distance between elements of different clusters, and is inherently “discriminative”. It is known that optimal clusterings into k clusters for any given k in polynomial time for this criterion can be computed. The second criterion seeks to minimize the description length of the clusters given a probabilistic generative model. We show that the optimal partitioning into 2 clusters, and approximate partitioning (guaranteed to be within a factor of 2 of the the optimal) for more clusters can be computed. To the best of our knowledge, this is the first time that a tractable algorithm for finding the optimal clustering with respect to the MDL criterion for 2 clusters has been given. Besides the optimality result for the MDL criterion, the chief contribution of this paper is to show that the same algorithm can be used to optimize a broad class of criteria, and hence can be used for many application specific criterion for which efficient algorithm are not known.
2 0.76216841 84 nips-2005-Generalization in Clustering with Unobserved Features
Author: Eyal Krupka, Naftali Tishby
Abstract: We argue that when objects are characterized by many attributes, clustering them on the basis of a relatively small random subset of these attributes can capture information on the unobserved attributes as well. Moreover, we show that under mild technical conditions, clustering the objects on the basis of such a random subset performs almost as well as clustering with the full attribute set. We prove a finite sample generalization theorems for this novel learning scheme that extends analogous results from the supervised learning setting. The scheme is demonstrated for collaborative filtering of users with movies rating as attributes. 1
3 0.71486729 13 nips-2005-A Probabilistic Approach for Optimizing Spectral Clustering
Author: Rong Jin, Feng Kang, Chris H. Ding
Abstract: Spectral clustering enjoys its success in both data clustering and semisupervised learning. But, most spectral clustering algorithms cannot handle multi-class clustering problems directly. Additional strategies are needed to extend spectral clustering algorithms to multi-class clustering problems. Furthermore, most spectral clustering algorithms employ hard cluster membership, which is likely to be trapped by the local optimum. In this paper, we present a new spectral clustering algorithm, named “Soft Cut”. It improves the normalized cut algorithm by introducing soft membership, and can be efficiently computed using a bound optimization algorithm. Our experiments with a variety of datasets have shown the promising performance of the proposed clustering algorithm. 1
4 0.70025706 102 nips-2005-Kernelized Infomax Clustering
Author: David Barber, Felix V. Agakov
Abstract: We propose a simple information-theoretic approach to soft clustering based on maximizing the mutual information I(x, y) between the unknown cluster labels y and the training patterns x with respect to parameters of specifically constrained encoding distributions. The constraints are chosen such that patterns are likely to be clustered similarly if they lie close to specific unknown vectors in the feature space. The method may be conveniently applied to learning the optimal affinity matrix, which corresponds to learning parameters of the kernelized encoder. The procedure does not require computations of eigenvalues of the Gram matrices, which makes it potentially attractive for clustering large data sets. 1
5 0.64215082 178 nips-2005-Soft Clustering on Graphs
Author: Kai Yu, Shipeng Yu, Volker Tresp
Abstract: We propose a simple clustering framework on graphs encoding pairwise data similarities. Unlike usual similarity-based methods, the approach softly assigns data to clusters in a probabilistic way. More importantly, a hierarchical clustering is naturally derived in this framework to gradually merge lower-level clusters into higher-level ones. A random walk analysis indicates that the algorithm exposes clustering structures in various resolutions, i.e., a higher level statistically models a longer-term diffusion on graphs and thus discovers a more global clustering structure. Finally we provide very encouraging experimental results. 1
6 0.63781244 177 nips-2005-Size Regularized Cut for Data Clustering
7 0.60658407 79 nips-2005-Fusion of Similarity Data in Clustering
8 0.39715105 127 nips-2005-Mixture Modeling by Affinity Propagation
9 0.39146397 126 nips-2005-Metric Learning by Collapsing Classes
10 0.376324 137 nips-2005-Non-Gaussian Component Analysis: a Semi-parametric Framework for Linear Dimension Reduction
11 0.37533143 71 nips-2005-Fast Krylov Methods for N-Body Learning
12 0.35481966 33 nips-2005-Bayesian Sets
13 0.34242114 112 nips-2005-Learning Minimum Volume Sets
14 0.34111077 56 nips-2005-Diffusion Maps, Spectral Clustering and Eigenfunctions of Fokker-Planck Operators
15 0.3269482 107 nips-2005-Large scale networks fingerprinting and visualization using the k-core decomposition
16 0.32628831 12 nips-2005-A PAC-Bayes approach to the Set Covering Machine
17 0.32139501 186 nips-2005-TD(0) Leads to Better Policies than Approximate Value Iteration
18 0.32026941 200 nips-2005-Variable KD-Tree Algorithms for Spatial Pattern Search and Discovery
19 0.31015024 18 nips-2005-Active Learning For Identifying Function Threshold Boundaries
20 0.30914092 129 nips-2005-Modeling Neural Population Spiking Activity with Gibbs Distributions
topicId topicWeight
[(3, 0.473), (10, 0.044), (27, 0.027), (31, 0.042), (34, 0.095), (41, 0.02), (55, 0.023), (65, 0.011), (69, 0.031), (73, 0.05), (88, 0.053), (91, 0.029)]
simIndex simValue paperId paperTitle
1 0.97851694 83 nips-2005-Generalization error bounds for classifiers trained with interdependent data
Author: Nicolas Usunier, Massih-reza Amini, Patrick Gallinari
Abstract: In this paper we propose a general framework to study the generalization properties of binary classifiers trained with data which may be dependent, but are deterministically generated upon a sample of independent examples. It provides generalization bounds for binary classification and some cases of ranking problems, and clarifies the relationship between these learning tasks. 1
same-paper 2 0.96497512 159 nips-2005-Q-Clustering
Author: Mukund Narasimhan, Nebojsa Jojic, Jeff A. Bilmes
Abstract: We show that Queyranne’s algorithm for minimizing symmetric submodular functions can be used for clustering with a variety of different objective functions. Two specific criteria that we consider in this paper are the single linkage and the minimum description length criteria. The first criterion tries to maximize the minimum distance between elements of different clusters, and is inherently “discriminative”. It is known that optimal clusterings into k clusters for any given k in polynomial time for this criterion can be computed. The second criterion seeks to minimize the description length of the clusters given a probabilistic generative model. We show that the optimal partitioning into 2 clusters, and approximate partitioning (guaranteed to be within a factor of 2 of the the optimal) for more clusters can be computed. To the best of our knowledge, this is the first time that a tractable algorithm for finding the optimal clustering with respect to the MDL criterion for 2 clusters has been given. Besides the optimality result for the MDL criterion, the chief contribution of this paper is to show that the same algorithm can be used to optimize a broad class of criteria, and hence can be used for many application specific criterion for which efficient algorithm are not known.
3 0.95140487 117 nips-2005-Learning from Data of Variable Quality
Author: Koby Crammer, Michael Kearns, Jennifer Wortman
Abstract: We initiate the study of learning from multiple sources of limited data, each of which may be corrupted at a different rate. We develop a complete theory of which data sources should be used for two fundamental problems: estimating the bias of a coin, and learning a classifier in the presence of label noise. In both cases, efficient algorithms are provided for computing the optimal subset of data. 1
4 0.9474327 131 nips-2005-Multiple Instance Boosting for Object Detection
Author: Cha Zhang, John C. Platt, Paul A. Viola
Abstract: A good image object detection algorithm is accurate, fast, and does not require exact locations of objects in a training set. We can create such an object detector by taking the architecture of the Viola-Jones detector cascade and training it with a new variant of boosting that we call MILBoost. MILBoost uses cost functions from the Multiple Instance Learning literature combined with the AnyBoost framework. We adapt the feature selection criterion of MILBoost to optimize the performance of the Viola-Jones cascade. Experiments show that the detection rate is up to 1.6 times better using MILBoost. This increased detection rate shows the advantage of simultaneously learning the locations and scales of the objects in the training set along with the parameters of the classifier. 1
5 0.82434607 85 nips-2005-Generalization to Unseen Cases
Author: Teemu Roos, Peter Grünwald, Petri Myllymäki, Henry Tirri
Abstract: We analyze classification error on unseen cases, i.e. cases that are different from those in the training set. Unlike standard generalization error, this off-training-set error may differ significantly from the empirical error with high probability even with large sample sizes. We derive a datadependent bound on the difference between off-training-set and standard generalization error. Our result is based on a new bound on the missing mass, which for small samples is stronger than existing bounds based on Good-Turing estimators. As we demonstrate on UCI data-sets, our bound gives nontrivial generalization guarantees in many practical cases. In light of these results, we show that certain claims made in the No Free Lunch literature are overly pessimistic. 1
6 0.80620456 112 nips-2005-Learning Minimum Volume Sets
7 0.72165298 49 nips-2005-Convergence and Consistency of Regularized Boosting Algorithms with Stationary B-Mixing Observations
8 0.7141977 41 nips-2005-Coarse sample complexity bounds for active learning
9 0.71052998 196 nips-2005-Two view learning: SVM-2K, Theory and Practice
10 0.69040561 84 nips-2005-Generalization in Clustering with Unobserved Features
11 0.67983764 9 nips-2005-A Domain Decomposition Method for Fast Manifold Learning
12 0.67841136 177 nips-2005-Size Regularized Cut for Data Clustering
13 0.66675818 58 nips-2005-Divergences, surrogate loss functions and experimental design
14 0.65804595 191 nips-2005-The Forgetron: A Kernel-Based Perceptron on a Fixed Budget
15 0.64992458 147 nips-2005-On the Convergence of Eigenspaces in Kernel Principal Component Analysis
16 0.64288497 47 nips-2005-Consistency of one-class SVM and related algorithms
17 0.64080578 160 nips-2005-Query by Committee Made Real
18 0.63913518 95 nips-2005-Improved risk tail bounds for on-line algorithms
19 0.63573718 154 nips-2005-Preconditioner Approximations for Probabilistic Graphical Models
20 0.62641144 74 nips-2005-Faster Rates in Regression via Active Learning