nips nips2010 nips2010-230 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Hairong Liu, Longin J. Latecki, Shuicheng Yan
Abstract: In this paper, we regard clustering as ensembles of k-ary affinity relations and clusters correspond to subsets of objects with maximal average affinity relations. The average affinity relation of a cluster is relaxed and well approximated by a constrained homogenous function. We present an efficient procedure to solve this optimization problem, and show that the underlying clusters can be robustly revealed by using priors systematically constructed from the data. Our method can automatically select some points to form clusters, leaving other points un-grouped; thus it is inherently robust to large numbers of outliers, which has seriously limited the applicability of classical methods. Our method also provides a unified solution to clustering from k-ary affinity relations with k ≥ 2, that is, it applies to both graph-based and hypergraph-based clustering problems. Both theoretical analysis and experimental results show the superiority of our method over classical solutions to the clustering problem, especially when there exists a large number of outliers.
Reference: text
sentIndex sentText sentNum sentScore
1 sg Abstract In this paper, we regard clustering as ensembles of k-ary affinity relations and clusters correspond to subsets of objects with maximal average affinity relations. [sent-5, score-0.711]
2 We present an efficient procedure to solve this optimization problem, and show that the underlying clusters can be robustly revealed by using priors systematically constructed from the data. [sent-7, score-0.182]
3 Our method can automatically select some points to form clusters, leaving other points un-grouped; thus it is inherently robust to large numbers of outliers, which has seriously limited the applicability of classical methods. [sent-8, score-0.219]
4 Our method also provides a unified solution to clustering from k-ary affinity relations with k ≥ 2, that is, it applies to both graph-based and hypergraph-based clustering problems. [sent-9, score-0.895]
5 Both theoretical analysis and experimental results show the superiority of our method over classical solutions to the clustering problem, especially when there exists a large number of outliers. [sent-10, score-0.443]
6 1 Introduction Data clustering is a fundamental problem in many fields, such as machine learning, data mining and computer vision [1]. [sent-11, score-0.371]
7 But it is generally agreed that the objects belonging to a cluster satisfy certain internal coherence condition, while the objects not belonging to a cluster usually do not satisfy this condition. [sent-13, score-0.571]
8 Most of existing clustering methods are partition-based, such as k-means [2], spectral clustering [3, 4, 5] and affinity propagation [6]. [sent-14, score-0.779]
9 The criteria to judge whether several objects belong to the same cluster or not are typically expressed by pairwise relations, which is encoded as the weights of an affinity graph. [sent-18, score-0.303]
10 However, in many applications, high order relations are more appropriate, and may even be the only choice, which naturally results in hyperedges in hypergraphs. [sent-19, score-0.361]
11 For example, when clustering a given set of points into lines, pairwise relations are not meaningful, since every pair of data points trivially defines a line. [sent-20, score-0.629]
12 As graph-based clustering problem has been well studied, many researchers tried to deal with hypergraph-based clustering by using existing graph-based clustering methods. [sent-22, score-1.113]
13 One direction is to transform a hypergraph into a graph, whose edge-weights are mapped from the weights of the original hypergraph. [sent-23, score-0.226]
14 Another direction is to generalize graph-based clustering method to hypergraphs. [sent-30, score-0.406]
15 [10] generalized the well-known “normalized cut” method [5] and defined a hypergraph normalized cut criterion for a k-partition of the vertices. [sent-32, score-0.261]
16 [11] cast the clustering problem with high order relations into a nonnegative factorization problem of the closest hyper-stochastic version of the input affinity tensor. [sent-34, score-0.489]
17 Based on game theory, Bulo and Pelillo [12] proposed to consider the hypergraph-based clustering problem as a multi-player non-cooperative “clustering game” and solve it by replicator equation, which is in fact a generalization of their previous work [13]. [sent-35, score-0.681]
18 In this paper, we propose a unified method for clustering from k-ary affinity relations, which is applicable to both graph-based and hypergraph-based clustering problems. [sent-38, score-0.777]
19 Our method is motivated by an intuitive observation: for a cluster with m objects, there may exist (m ) possible k-ary affinity k relations, and most of these (sometimes even all) k-ary affinity relations should agree with each other on the same criterion. [sent-39, score-0.254]
20 For example, in the line clustering problem, for m points on the same line, there are (m ) possible triplets, and all these triplets should satisfy the criterion that they lie on 3 a line. [sent-40, score-0.515]
21 The ensemble of such large number of affinity relations is hardly produced by outliers and is also very robust to noises, thus yielding a robust mechanism for clustering. [sent-41, score-0.427]
22 2 Formulation Clustering from k-ary affinity relations can be intuitively described as clustering on a special kind of edge-weighted hypergraph, k-graph. [sent-42, score-0.489]
23 We only consider the k-ary affinity relations with no duplicate objects, that is, the hyperedges among k different vertices. [sent-44, score-0.4]
24 For hyperedges with duplicated vertices, we simply set their weights to zeros. [sent-45, score-0.274]
25 Each hyperedge e ∈ E involves k vertices, thus can be represented as k-tuple {v1 , · · · , vk }. [sent-46, score-0.342]
26 The k weighted adjacency array of graph G is an n × n × · · · × n super-symmetry array, denoted by M , and defined as { w({v1 , · · · , vk }) if {v1 , · · · , vk } ∈ E, (1) M (v1 , · · · , vk ) = 0 else, Note that each edge {v1 , · · · , vk } ∈ E has k! [sent-47, score-0.655]
27 If U is really a cluster, then most of hyperedges in EU should have large weights. [sent-50, score-0.243]
28 The simplest measure to reflect such ensemble phenomenon is the sum of all entries in M whose corresponding hyperedges contain only vertices in U , which can be expressed as: ∑ S(U ) = M (v1 , · · · , vk ). [sent-51, score-0.565]
29 (2) v1 ,···,vk ∈U Suppose y is an n × 1 indicator vector of the subset U , such that yvi = 1 if vi ∈ U and zero otherwise, then S(U ) can be expressed as: S(U ) = S(y) = ∑ k M (v1 , · · · , vk ) yv1 · · · yvk . [sent-52, score-0.196]
30 (3) v1 ,···,vk ∈V Obviously, S(U ) usually increases as the number of vertices in U increases. [sent-53, score-0.172]
31 As ∑ i yi = m, ∑ M (v1 , · · · , vk ) yv1 · · · yvk v1 ,···,vk ∈V ∑ = k k yv yv M (v1 , · · · , vk ) 1 · · · k m m k M (v1 , · · · , vk ) xv1 · · · xvk , (4) v1 ,···,vk ∈V i xi = 1 is a natural constraint over x. [sent-55, score-0.611]
32 Thus, the clustering problem corresponds to the problem of maximizing Sav (U ). [sent-57, score-0.371]
33 Then the problem becomes: { ∑ ∏k max f (x) = v1 ,···,vk ∈V M (v1 , · · · , vk ) i=1 xvi , (5) subject to x ∈ ∆n and xi ∈ [0, ε] ∑ where ∆n = {x ∈ Rn : x ≥ 0 and i xi = 1} is the standard simplex in Rn . [sent-60, score-0.265]
34 In [12], Bulo and Pelillo proposed to cast the hypergraph-based clustering problem into a clustering game, which leads to a similar formulation as (5). [sent-64, score-0.776]
35 Setting ε < 1 means that the probability of choosing each strategy (from game theory perspective) or choosing each object (from our perspective) has an known upper bound, which is in fact a prior, while ε = 1 represents a noninformative prior. [sent-66, score-0.393]
36 For example, if the weight of a hyperedge is extremely large, then the cluster may only select the vertices associated with this hyperedge, which is usually not desirable. [sent-68, score-0.468]
37 Large maxima correspond to true clusters and small maxima usually form meaningless subsets. [sent-73, score-0.306]
38 Since the formulation (5) is a constrained optimization problem, by adding Lagrangian multipliers λ, µ1 , · · · , µn and β1 , · · · , βn , µi ≥ 0 and βi ≥ 0 for all i = 1, · · · , n, we can obtain its Lagrangian function: n n n ∑ ∑ ∑ L(x, λ, µ, β) = f (x) − λ( xi − 1) + µi xi + βi (ε − xi ). [sent-75, score-0.211]
39 (6) i=1 i=1 i=1 The reward at vertex i, denoted by ri (x), is defined as follows: ri (x) = ∑ M (v1 , · · · , vk−1 , i) v1 ,···,vk−1 ∈V Since M is a super-symmetry array, then of f (x) at x. [sent-76, score-0.431]
40 , ri (x) is proportional to the gradient 3 Any local maximizer x∗ must satisfy the Karush-Kuhn-Tucker (KKT) condition [14], i. [sent-79, score-0.224]
41 i i=1 i ∑n Since x∗ , µi and βi are all nonnegative for all i’s, i=1 x∗ µi = 0 is equivalent to saying that if i i ∑n ∗ ∗ xi > 0, then µi = 0, and i=1 (ε − xi )βi = 0 is equivalent to saying that if x∗ < ε, then βi = 0. [sent-83, score-0.188]
42 For any x, if we want to update it to increase f (x), then the values of some components belonging to Vd (x) must decrease and the values of some components belonging to Vu (x) must increase. [sent-91, score-0.187]
43 According to Theorem 1, if x is the solution of (5), then ri (x) ≤ rj (x), ∀i ∈ Vu (x), ∀j ∈ Vd (x). [sent-92, score-0.206]
44 On the contrary, if ∃i ∈ Vu (x), ∃j ∈ Vd (x), ri (x) > rj (x), then x is not the solution of (5). [sent-93, score-0.206]
45 That is, let { xl , l ̸= i, l ̸= j; ′ xl + α, l = i; xl = (10) xl − α, l = j. [sent-95, score-0.24]
46 and define rij (x) = ∑ M (v1 , · · · , vk−2 , i, j) v1 ,···,vk−2 Then k−2 ∏ xvt (11) t=1 f (x′ ) − f (x) = −k(k − 1)rij (x)α2 + k(ri (x) − rj (x))α (12) Since ri (x) > rj (x), we can always select a proper α > 0 to increase f (x). [sent-96, score-0.475]
47 Since ri (x) > rj (x), if rij (x) ≤ 0, then when α = min(xj , ε − xi ), the increase of f (x) reaches maximum; if rij > 0, then when α = ri (x)−rj (x) min(xj , ε − xi , 2(k−1)rij (x) ), the increase of f (x) reaches maximum. [sent-98, score-0.688]
48 According to the above analysis, if ∃i ∈ Vu (x), ∃j ∈ Vd (x), ri (x) > rj (x), then we can update x to increase f (x). [sent-99, score-0.245]
49 Such procedure iterates until ri (x) ≤ rj (x), ∀i ∈ Vu (x), ∀j ∈ Vd (x). [sent-100, score-0.206]
50 From a prior (initialization) x(0), the algorithm to compute the local maximizer of (5) is summarized in Algorithm 1, which successively chooses the “best” vertex and the “worst” vertex and then update their corresponding components of x. [sent-101, score-0.426]
51 Since significant maxima of formulation (5) usually correspond to true clusters, we need multiple initializations (priors) to obtain them, with at least one initialization at the basin of attraction of every significant maximum. [sent-102, score-0.225]
52 Such informative priors in fact can be easily and efficiently constructed from the neighborhood of every vertex (vertices with hyperedges connecting to this vertex), because the neighbors of a vertex generally have much higher probabilities to belong to the same cluster. [sent-103, score-0.653]
53 Algorithm 2 Construct a prior x(0) containing vertex v 1: Input: Hyperedge set E(v) and ε; 2: Sort the hyperedges in E(v) in descending order according to their weights; 3: for i = 1, · · · , |E(v)| do 4: Add all vertices associated with the i-th hyperedge to L. [sent-105, score-0.752]
54 If |L| ≥ [ 1 ], then break; ε 5: end for 1 6: For each vertex vj ∈ L, set the corresponding component xvj (0) = |L| ; 7: Output: a prior x(0). [sent-106, score-0.171]
55 For a vertex v, the set of hyperedges connected to v is denoted by E(v). [sent-107, score-0.414]
56 In this way, we can robustly obtain multiple clusters simultaneously, and these clusters may overlap, both of which are desirable properties in many applications. [sent-114, score-0.265]
57 Note that the clustering game approach [12] utilizes a noninformative prior, that is, all vertices have equal probability. [sent-115, score-0.867]
58 In clustering game approach [12], if xi (t) = 0, then xi (t + 1) = 0, which means that it can only drop points and if a point is initially not included, then it cannot be selected. [sent-117, score-0.881]
59 However, our method can automatically add or drop points, which is another key difference to the clustering game approach. [sent-118, score-0.751]
60 Suppose the maximal number of hyperedges containing a certain vertex is h, then the time complexity of Algorithm 1 is O(thk), where t is the number of iterations. [sent-121, score-0.414]
61 The first one addresses the problem of line clustering, the second addresses the problem of illumination-invariant face clustering, and the third addresses the problem of affine-invariant point set matching. [sent-124, score-0.166]
62 We compare our method with clique averaging [9] algorithm and matching game approach [12]. [sent-125, score-0.701]
63 In all experiments, the clique averaging approach needs to know the number of clusters in advance; however, both clustering game approach and our method can automatically reveal the number of clusters, which yields the advantages of the latter two in many applications. [sent-126, score-1.146]
64 1 Line Clustering In this experiment, we consider the problem of clustering lines in 2D point sets. [sent-128, score-0.432]
65 The dissimilarity measure on triplets of points is given by their mean distance to the best fitting line. [sent-130, score-0.18]
66 If d(i, j, k) is the dissimilarity measure of points {i, j, k}, then the similarity function is 2 given by s({i, j, k}) = exp(−d(i, j, k)2 /σd ), where σd is a scaling parameter, which controls the sensitivity of the similarity measure to deformation. [sent-131, score-0.155]
67 We also randomly add outliers into the point set. [sent-135, score-0.217]
68 1(a) illustrates such a point set with three lines shown in red, blue and green colors, respectively, and the outliers are shown in magenta color. [sent-137, score-0.283]
69 We first fix the number of outliers to be 60, vary the scaling parameter σd from 0. [sent-139, score-0.226]
70 Obviously, our method is nearly not affected by the scaling parameter σd , while the clustering game approach is very sensitive to σd . [sent-144, score-0.802]
71 Note that σd in fact controls the weights of the hyperedge graph and many graph-based algorithms are notoriously sensitive to the weights of the graph. [sent-145, score-0.304]
72 1(b), we observe that when σd = 4σ, the clustering game approach will get the best performance. [sent-148, score-0.681]
73 1, the results of clustering game approach, clique averaging algorithm and our method are shown in blue, green and red colors in Fig. [sent-151, score-0.994]
74 As the figure shows, when the noise is small, matching game approach outperforms clique averaging algorithm, and when the noise becomes large, the clique averaging algorithm outperforms matching game approach. [sent-153, score-1.332]
75 This is because matching game approach is more robust to outliers, while the clique averaging algorithm seems more robust to noises. [sent-154, score-0.756]
76 Our method always gets the best result, since it can not only select coherent clusters as matching game approach, but also control the size of clusters, thus avoiding the problem of too few points selected into clusters. [sent-155, score-0.592]
77 As we stressed in Section 2, clustering game approach is in fact a special case of our method when ε = 1, thus, the result at ε = 1 is nearly the same as the result of clustering game approach in Fig. [sent-162, score-1.397]
78 Note that the best result appears when 1/ε > 30, which is due to the fact that some outliers fall into the line clusters, as can be seen in Fig. [sent-165, score-0.187]
79 2 Illumination-invariant face clustering It has been shown that the variability of images of a Labmertian surface in fixed pose, but under variable lighting conditions where no surface point is shadowed, constitutes a three dimensional linear subspace [15]. [sent-168, score-0.518]
80 The case with outliers consists 10 additional faces each from a different individual. [sent-175, score-0.22]
81 The results clearly show that partition-based clustering method (clique averaging) is very sensitive to outliers, but performs better when there are no outliers. [sent-178, score-0.453]
82 The clustering game approach and our method both perform well, especially when there are outliers, and our method performs a little better. [sent-179, score-0.751]
83 6 Figure 1: Results on clustering three lines with noises and outliers. [sent-180, score-0.475]
84 The performance of clique averaging algorithm [9], matching game approach [12] and our method is shown as green dashed, blue dotted and read solid curves, respectively. [sent-181, score-0.766]
85 Table 1: Experiments on illuminant-invariant face clustering Classes Outliers Clique Averaging Clustering Game Our Method 4. [sent-183, score-0.414]
86 In this subsection, we will show that matching planar point sets under different viewpoints can be formulated into a hypergraph clustering problem and our algorithm is very suitable for such tasks. [sent-211, score-0.733]
87 If we regard each candidate match (S −S ′ ′ ′ |det(A)|)2 as a point, then s = exp(− ijk i jσk ) serves as a natural similarity measure for three 2 d ′ , mjj ′ and mkk ′ , σd is a scaling parameter, and the correct matching points (candidate matches), mii configuration then naturally form a cluster. [sent-215, score-0.427]
88 Since most of points (candidate matches) should not belong to any cluster, partition-based clustering method, such as clique aver7 aging method, cannot be used. [sent-224, score-0.639]
89 Thus, we only compare our method with matching game approach and measure the performance of these two methods by counting how many matches agree with the ground truths. [sent-225, score-0.473]
90 Obviously, our method is very robust to σd , while the matching game approach is very sensitive to it. [sent-230, score-0.515]
91 The red solid curves demonstrate the performance of our method, while the blue dotted curve illustrates the performance of matching game approach. [sent-246, score-0.453]
92 5 Discussion In this paper, we characterized clustering as an ensemble of all associated affinity relations and relax the clustering problem into optimizing a constrained homogenous function. [sent-247, score-0.935]
93 We showed that the clustering game approach turns out to be a special case of our method. [sent-248, score-0.681]
94 We also proposed an efficient algorithm to automatically reveal the clusters in a data set, even under severe noises and a large number of outliers. [sent-249, score-0.225]
95 A key issue with hypergraph-based clustering is the high computational cost of the construction of a hypergraph, and we are currently studying how to efficiently construct an approximate hypergraph and then perform clustering on the incomplete hypergraph. [sent-252, score-0.937]
96 Wu, “An efficient k-means clustering algorithm: Analysis and implementation,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. [sent-268, score-0.371]
97 Kulis, “Kernel k-means: spectral clustering and normalized cuts,” in Proceedings of the tenth ACM International Conference on Knowledge Discovery and Data Mining, 2004, pp. [sent-281, score-0.408]
98 Chan, “Multilevel spectral hypergraph partitioning with arbitrary vertex sizes,” IEEE Transactions on Computer-aided Design of Integrated Circuits and Systems, vol. [sent-298, score-0.403]
99 Hazan, “Multi-way clustering using super-symmetric non-negative tensor factorization,” in European Conference on Computer Vision, 2006, pp. [sent-325, score-0.371]
100 Pelillo, “A game-theoretic approach to hypergraph clustering,” in Advances in Neural Information Processing Systems, 2009. [sent-329, score-0.195]
wordName wordTfidf (topN-words)
[('clustering', 0.371), ('game', 0.31), ('af', 0.246), ('hyperedges', 0.243), ('hyperedge', 0.195), ('hypergraph', 0.195), ('outliers', 0.187), ('clique', 0.182), ('nity', 0.179), ('vertex', 0.171), ('vu', 0.157), ('vk', 0.147), ('vd', 0.144), ('vertices', 0.143), ('relations', 0.118), ('clusters', 0.117), ('ri', 0.112), ('cluster', 0.101), ('sav', 0.097), ('averaging', 0.096), ('rj', 0.094), ('rij', 0.087), ('pelillo', 0.085), ('maximizer', 0.084), ('maxima', 0.08), ('matching', 0.078), ('belonging', 0.074), ('bulo', 0.073), ('noises', 0.073), ('objects', 0.068), ('array', 0.067), ('kriegman', 0.064), ('triplets', 0.064), ('dissimilarity', 0.064), ('det', 0.06), ('xl', 0.06), ('xi', 0.059), ('viewpoints', 0.059), ('contour', 0.058), ('kkt', 0.055), ('singapore', 0.055), ('points', 0.052), ('matches', 0.05), ('kri', 0.049), ('mjj', 0.049), ('mkk', 0.049), ('rodriguez', 0.049), ('shadowed', 0.049), ('shashua', 0.049), ('xvt', 0.049), ('yvk', 0.049), ('sensitive', 0.047), ('shapes', 0.046), ('robust', 0.045), ('initializations', 0.043), ('homogenous', 0.043), ('ijk', 0.043), ('hypergraphs', 0.043), ('noninformative', 0.043), ('mii', 0.043), ('face', 0.043), ('lighting', 0.041), ('rewards', 0.04), ('object', 0.04), ('obviously', 0.04), ('scaling', 0.039), ('basin', 0.039), ('duplicate', 0.039), ('increase', 0.039), ('spectral', 0.037), ('candidate', 0.037), ('superiority', 0.037), ('regard', 0.037), ('reward', 0.036), ('pairwise', 0.036), ('blue', 0.035), ('automatically', 0.035), ('saying', 0.035), ('method', 0.035), ('formulation', 0.034), ('priors', 0.034), ('belong', 0.034), ('judge', 0.033), ('faces', 0.033), ('images', 0.033), ('ensemble', 0.032), ('zien', 0.032), ('weights', 0.031), ('addresses', 0.031), ('yv', 0.031), ('robustly', 0.031), ('lines', 0.031), ('cut', 0.031), ('nq', 0.03), ('eu', 0.03), ('solid', 0.03), ('point', 0.03), ('illumination', 0.029), ('usually', 0.029), ('satisfy', 0.028)]
simIndex simValue paperId paperTitle
same-paper 1 0.99999964 230 nips-2010-Robust Clustering as Ensembles of Affinity Relations
Author: Hairong Liu, Longin J. Latecki, Shuicheng Yan
Abstract: In this paper, we regard clustering as ensembles of k-ary affinity relations and clusters correspond to subsets of objects with maximal average affinity relations. The average affinity relation of a cluster is relaxed and well approximated by a constrained homogenous function. We present an efficient procedure to solve this optimization problem, and show that the underlying clusters can be robustly revealed by using priors systematically constructed from the data. Our method can automatically select some points to form clusters, leaving other points un-grouped; thus it is inherently robust to large numbers of outliers, which has seriously limited the applicability of classical methods. Our method also provides a unified solution to clustering from k-ary affinity relations with k ≥ 2, that is, it applies to both graph-based and hypergraph-based clustering problems. Both theoretical analysis and experimental results show the superiority of our method over classical solutions to the clustering problem, especially when there exists a large number of outliers.
2 0.2600269 273 nips-2010-Towards Property-Based Classification of Clustering Paradigms
Author: Margareta Ackerman, Shai Ben-David, David Loker
Abstract: Clustering is a basic data mining task with a wide variety of applications. Not surprisingly, there exist many clustering algorithms. However, clustering is an ill defined problem - given a data set, it is not clear what a “correct” clustering for that set is. Indeed, different algorithms may yield dramatically different outputs for the same input sets. Faced with a concrete clustering task, a user needs to choose an appropriate clustering algorithm. Currently, such decisions are often made in a very ad hoc, if not completely random, manner. Given the crucial effect of the choice of a clustering algorithm on the resulting clustering, this state of affairs is truly regrettable. In this paper we address the major research challenge of developing tools for helping users make more informed decisions when they come to pick a clustering tool for their data. This is, of course, a very ambitious endeavor, and in this paper, we make some first steps towards this goal. We propose to address this problem by distilling abstract properties of the input-output behavior of different clustering paradigms. In this paper, we demonstrate how abstract, intuitive properties of clustering functions can be used to taxonomize a set of popular clustering algorithmic paradigms. On top of addressing deterministic clustering algorithms, we also propose similar properties for randomized algorithms and use them to highlight functional differences between different common implementations of k-means clustering. We also study relationships between the properties, independent of any particular algorithm. In particular, we strengthen Kleinberg’s famous impossibility result, while providing a simpler proof. 1
3 0.16641653 223 nips-2010-Rates of convergence for the cluster tree
Author: Kamalika Chaudhuri, Sanjoy Dasgupta
Abstract: For a density f on Rd , a high-density cluster is any connected component of {x : f (x) ≥ λ}, for some λ > 0. The set of all high-density clusters form a hierarchy called the cluster tree of f . We present a procedure for estimating the cluster tree given samples from f . We give finite-sample convergence rates for our algorithm, as well as lower bounds on the sample complexity of this estimation problem. 1
4 0.1614279 221 nips-2010-Random Projections for $k$-means Clustering
Author: Christos Boutsidis, Anastasios Zouzias, Petros Drineas
Abstract: This paper discusses the topic of dimensionality reduction for k-means clustering. We prove that any set of n points in d dimensions (rows in a matrix A ∈ Rn×d ) can be projected into t = Ω(k/ε2 ) dimensions, for any ε ∈ (0, 1/3), in O(nd⌈ε−2 k/ log(d)⌉) time, such that with constant probability the optimal k-partition of the point set is preserved within a factor of 2 + √ The projection is done ε. √ by post-multiplying A with a d × t random matrix R having entries +1/ t or −1/ t with equal probability. A numerical implementation of our technique and experiments on a large face images dataset verify the speed and the accuracy of our theoretical results.
5 0.15829365 261 nips-2010-Supervised Clustering
Author: Pranjal Awasthi, Reza B. Zadeh
Abstract: Despite the ubiquity of clustering as a tool in unsupervised learning, there is not yet a consensus on a formal theory, and the vast majority of work in this direction has focused on unsupervised clustering. We study a recently proposed framework for supervised clustering where there is access to a teacher. We give an improved generic algorithm to cluster any concept class in that model. Our algorithm is query-efficient in the sense that it involves only a small amount of interaction with the teacher. We also present and study two natural generalizations of the model. The model assumes that the teacher response to the algorithm is perfect. We eliminate this limitation by proposing a noisy model and give an algorithm for clustering the class of intervals in this noisy model. We also propose a dynamic model where the teacher sees a random subset of the points. Finally, for datasets satisfying a spectrum of weak to strong properties, we give query bounds, and show that a class of clustering functions containing Single-Linkage will find the target clustering under the strongest property.
6 0.12473826 39 nips-2010-Bayesian Action-Graph Games
7 0.12256341 210 nips-2010-Practical Large-Scale Optimization for Max-norm Regularization
9 0.10281216 166 nips-2010-Minimum Average Cost Clustering
10 0.098705955 225 nips-2010-Relaxed Clipping: A Global Training Method for Robust Regression and Classification
11 0.096909687 70 nips-2010-Efficient Optimization for Discriminative Latent Class Models
12 0.092116363 236 nips-2010-Semi-Supervised Learning with Adversarially Missing Label Information
13 0.084581144 67 nips-2010-Dynamic Infinite Relational Model for Time-varying Relational Data Analysis
14 0.081934512 231 nips-2010-Robust PCA via Outlier Pursuit
15 0.081744954 181 nips-2010-Network Flow Algorithms for Structured Sparsity
16 0.078025699 98 nips-2010-Functional form of motion priors in human motion perception
17 0.077761106 193 nips-2010-Online Learning: Random Averages, Combinatorial Parameters, and Learnability
18 0.075693667 55 nips-2010-Cross Species Expression Analysis using a Dirichlet Process Mixture Model with Latent Matchings
19 0.073580451 62 nips-2010-Discriminative Clustering by Regularized Information Maximization
20 0.070333801 257 nips-2010-Structured Determinantal Point Processes
topicId topicWeight
[(0, 0.201), (1, 0.046), (2, 0.06), (3, 0.018), (4, -0.037), (5, -0.095), (6, -0.034), (7, -0.031), (8, 0.186), (9, 0.07), (10, 0.095), (11, -0.097), (12, 0.105), (13, -0.135), (14, 0.355), (15, -0.095), (16, 0.099), (17, 0.085), (18, -0.038), (19, -0.02), (20, 0.201), (21, 0.011), (22, -0.009), (23, 0.012), (24, -0.105), (25, -0.022), (26, 0.067), (27, -0.043), (28, 0.056), (29, 0.053), (30, 0.054), (31, 0.008), (32, -0.0), (33, 0.001), (34, 0.012), (35, -0.038), (36, -0.046), (37, 0.031), (38, 0.078), (39, -0.07), (40, 0.028), (41, -0.081), (42, -0.017), (43, -0.071), (44, -0.034), (45, -0.043), (46, -0.079), (47, 0.028), (48, 0.053), (49, 0.041)]
simIndex simValue paperId paperTitle
same-paper 1 0.97506559 230 nips-2010-Robust Clustering as Ensembles of Affinity Relations
Author: Hairong Liu, Longin J. Latecki, Shuicheng Yan
Abstract: In this paper, we regard clustering as ensembles of k-ary affinity relations and clusters correspond to subsets of objects with maximal average affinity relations. The average affinity relation of a cluster is relaxed and well approximated by a constrained homogenous function. We present an efficient procedure to solve this optimization problem, and show that the underlying clusters can be robustly revealed by using priors systematically constructed from the data. Our method can automatically select some points to form clusters, leaving other points un-grouped; thus it is inherently robust to large numbers of outliers, which has seriously limited the applicability of classical methods. Our method also provides a unified solution to clustering from k-ary affinity relations with k ≥ 2, that is, it applies to both graph-based and hypergraph-based clustering problems. Both theoretical analysis and experimental results show the superiority of our method over classical solutions to the clustering problem, especially when there exists a large number of outliers.
2 0.90804404 273 nips-2010-Towards Property-Based Classification of Clustering Paradigms
Author: Margareta Ackerman, Shai Ben-David, David Loker
Abstract: Clustering is a basic data mining task with a wide variety of applications. Not surprisingly, there exist many clustering algorithms. However, clustering is an ill defined problem - given a data set, it is not clear what a “correct” clustering for that set is. Indeed, different algorithms may yield dramatically different outputs for the same input sets. Faced with a concrete clustering task, a user needs to choose an appropriate clustering algorithm. Currently, such decisions are often made in a very ad hoc, if not completely random, manner. Given the crucial effect of the choice of a clustering algorithm on the resulting clustering, this state of affairs is truly regrettable. In this paper we address the major research challenge of developing tools for helping users make more informed decisions when they come to pick a clustering tool for their data. This is, of course, a very ambitious endeavor, and in this paper, we make some first steps towards this goal. We propose to address this problem by distilling abstract properties of the input-output behavior of different clustering paradigms. In this paper, we demonstrate how abstract, intuitive properties of clustering functions can be used to taxonomize a set of popular clustering algorithmic paradigms. On top of addressing deterministic clustering algorithms, we also propose similar properties for randomized algorithms and use them to highlight functional differences between different common implementations of k-means clustering. We also study relationships between the properties, independent of any particular algorithm. In particular, we strengthen Kleinberg’s famous impossibility result, while providing a simpler proof. 1
3 0.78200066 261 nips-2010-Supervised Clustering
Author: Pranjal Awasthi, Reza B. Zadeh
Abstract: Despite the ubiquity of clustering as a tool in unsupervised learning, there is not yet a consensus on a formal theory, and the vast majority of work in this direction has focused on unsupervised clustering. We study a recently proposed framework for supervised clustering where there is access to a teacher. We give an improved generic algorithm to cluster any concept class in that model. Our algorithm is query-efficient in the sense that it involves only a small amount of interaction with the teacher. We also present and study two natural generalizations of the model. The model assumes that the teacher response to the algorithm is perfect. We eliminate this limitation by proposing a noisy model and give an algorithm for clustering the class of intervals in this noisy model. We also propose a dynamic model where the teacher sees a random subset of the points. Finally, for datasets satisfying a spectrum of weak to strong properties, we give query bounds, and show that a class of clustering functions containing Single-Linkage will find the target clustering under the strongest property.
Author: Matthias Hein, Thomas Bühler
Abstract: Many problems in machine learning and statistics can be formulated as (generalized) eigenproblems. In terms of the associated optimization problem, computing linear eigenvectors amounts to finding critical points of a quadratic function subject to quadratic constraints. In this paper we show that a certain class of constrained optimization problems with nonquadratic objective and constraints can be understood as nonlinear eigenproblems. We derive a generalization of the inverse power method which is guaranteed to converge to a nonlinear eigenvector. We apply the inverse power method to 1-spectral clustering and sparse PCA which can naturally be formulated as nonlinear eigenproblems. In both applications we achieve state-of-the-art results in terms of solution quality and runtime. Moving beyond the standard eigenproblem should be useful also in many other applications and our inverse power method can be easily adapted to new problems. 1
5 0.68185699 221 nips-2010-Random Projections for $k$-means Clustering
Author: Christos Boutsidis, Anastasios Zouzias, Petros Drineas
Abstract: This paper discusses the topic of dimensionality reduction for k-means clustering. We prove that any set of n points in d dimensions (rows in a matrix A ∈ Rn×d ) can be projected into t = Ω(k/ε2 ) dimensions, for any ε ∈ (0, 1/3), in O(nd⌈ε−2 k/ log(d)⌉) time, such that with constant probability the optimal k-partition of the point set is preserved within a factor of 2 + √ The projection is done ε. √ by post-multiplying A with a d × t random matrix R having entries +1/ t or −1/ t with equal probability. A numerical implementation of our technique and experiments on a large face images dataset verify the speed and the accuracy of our theoretical results.
6 0.59313661 223 nips-2010-Rates of convergence for the cluster tree
7 0.5589776 210 nips-2010-Practical Large-Scale Optimization for Max-norm Regularization
8 0.54199022 62 nips-2010-Discriminative Clustering by Regularized Information Maximization
9 0.50933117 166 nips-2010-Minimum Average Cost Clustering
10 0.50357074 67 nips-2010-Dynamic Infinite Relational Model for Time-varying Relational Data Analysis
11 0.3997165 105 nips-2010-Getting lost in space: Large sample analysis of the resistance distance
12 0.39228475 70 nips-2010-Efficient Optimization for Discriminative Latent Class Models
13 0.38699138 259 nips-2010-Subgraph Detection Using Eigenvector L1 Norms
14 0.38672101 155 nips-2010-Learning the context of a category
15 0.38118356 55 nips-2010-Cross Species Expression Analysis using a Dirichlet Process Mixture Model with Latent Matchings
16 0.37496677 231 nips-2010-Robust PCA via Outlier Pursuit
17 0.36082584 39 nips-2010-Bayesian Action-Graph Games
18 0.35985458 226 nips-2010-Repeated Games against Budgeted Adversaries
19 0.32276165 251 nips-2010-Sphere Embedding: An Application to Part-of-Speech Induction
20 0.31765386 234 nips-2010-Segmentation as Maximum-Weight Independent Set
topicId topicWeight
[(13, 0.049), (27, 0.041), (30, 0.05), (35, 0.012), (45, 0.138), (50, 0.036), (52, 0.021), (60, 0.044), (77, 0.516), (90, 0.022)]
simIndex simValue paperId paperTitle
1 0.9419772 34 nips-2010-Attractor Dynamics with Synaptic Depression
Author: K. Wong, He Wang, Si Wu, Chi Fung
Abstract: Neuronal connection weights exhibit short-term depression (STD). The present study investigates the impact of STD on the dynamics of a continuous attractor neural network (CANN) and its potential roles in neural information processing. We find that the network with STD can generate both static and traveling bumps, and STD enhances the performance of the network in tracking external inputs. In particular, we find that STD endows the network with slow-decaying plateau behaviors, namely, the network being initially stimulated to an active state will decay to silence very slowly in the time scale of STD rather than that of neural signaling. We argue that this provides a mechanism for neural systems to hold short-term memory easily and shut off persistent activities naturally.
2 0.92528838 16 nips-2010-A VLSI Implementation of the Adaptive Exponential Integrate-and-Fire Neuron Model
Author: Sebastian Millner, Andreas Grübl, Karlheinz Meier, Johannes Schemmel, Marc-olivier Schwartz
Abstract: We describe an accelerated hardware neuron being capable of emulating the adaptive exponential integrate-and-fire neuron model. Firing patterns of the membrane stimulated by a step current are analyzed in transistor level simulations and in silicon on a prototype chip. The neuron is destined to be the hardware neuron of a highly integrated wafer-scale system reaching out for new computational paradigms and opening new experimentation possibilities. As the neuron is dedicated as a universal device for neuroscientific experiments, the focus lays on parameterizability and reproduction of the analytical model. 1
same-paper 3 0.86119211 230 nips-2010-Robust Clustering as Ensembles of Affinity Relations
Author: Hairong Liu, Longin J. Latecki, Shuicheng Yan
Abstract: In this paper, we regard clustering as ensembles of k-ary affinity relations and clusters correspond to subsets of objects with maximal average affinity relations. The average affinity relation of a cluster is relaxed and well approximated by a constrained homogenous function. We present an efficient procedure to solve this optimization problem, and show that the underlying clusters can be robustly revealed by using priors systematically constructed from the data. Our method can automatically select some points to form clusters, leaving other points un-grouped; thus it is inherently robust to large numbers of outliers, which has seriously limited the applicability of classical methods. Our method also provides a unified solution to clustering from k-ary affinity relations with k ≥ 2, that is, it applies to both graph-based and hypergraph-based clustering problems. Both theoretical analysis and experimental results show the superiority of our method over classical solutions to the clustering problem, especially when there exists a large number of outliers.
4 0.84298843 15 nips-2010-A Theory of Multiclass Boosting
Author: Indraneel Mukherjee, Robert E. Schapire
Abstract: Boosting combines weak classifiers to form highly accurate predictors. Although the case of binary classification is well understood, in the multiclass setting, the “correct” requirements on the weak classifier, or the notion of the most efficient boosting algorithms are missing. In this paper, we create a broad and general framework, within which we make precise and identify the optimal requirements on the weak-classifier, as well as design the most effective, in a certain sense, boosting algorithms that assume such requirements. 1
5 0.77386737 142 nips-2010-Learning Bounds for Importance Weighting
Author: Corinna Cortes, Yishay Mansour, Mehryar Mohri
Abstract: This paper presents an analysis of importance weighting for learning from finite samples and gives a series of theoretical and algorithmic results. We point out simple cases where importance weighting can fail, which suggests the need for an analysis of the properties of this technique. We then give both upper and lower bounds for generalization with bounded importance weights and, more significantly, give learning guarantees for the more common case of unbounded importance weights under the weak assumption that the second moment is bounded, a condition related to the R´ nyi divergence of the training and test distributions. e These results are based on a series of novel and general bounds we derive for unbounded loss functions, which are of independent interest. We use these bounds to guide the definition of an alternative reweighting algorithm and report the results of experiments demonstrating its benefits. Finally, we analyze the properties of normalized importance weights which are also commonly used.
6 0.72182399 234 nips-2010-Segmentation as Maximum-Weight Independent Set
7 0.65241647 68 nips-2010-Effects of Synaptic Weight Diffusion on Learning in Decision Making Networks
9 0.59909117 10 nips-2010-A Novel Kernel for Learning a Neuron Model from Spike Train Data
10 0.57890642 253 nips-2010-Spike timing-dependent plasticity as dynamic filter
11 0.57519114 8 nips-2010-A Log-Domain Implementation of the Diffusion Network in Very Large Scale Integration
12 0.57347298 244 nips-2010-Sodium entry efficiency during action potentials: A novel single-parameter family of Hodgkin-Huxley models
13 0.5605759 200 nips-2010-Over-complete representations on recurrent neural networks can support persistent percepts
14 0.54464334 127 nips-2010-Inferring Stimulus Selectivity from the Spatial Structure of Neural Network Dynamics
15 0.53994358 238 nips-2010-Short-term memory in neuronal networks through dynamical compressed sensing
16 0.5131464 117 nips-2010-Identifying graph-structured activation patterns in networks
17 0.49523652 31 nips-2010-An analysis on negative curvature induced by singularity in multi-layer neural-network learning
18 0.48994499 115 nips-2010-Identifying Dendritic Processing
19 0.48930714 30 nips-2010-An Inverse Power Method for Nonlinear Eigenproblems with Applications in 1-Spectral Clustering and Sparse PCA
20 0.48405254 182 nips-2010-New Adaptive Algorithms for Online Classification