nips nips2010 nips2010-261 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Pranjal Awasthi, Reza B. Zadeh
Abstract: Despite the ubiquity of clustering as a tool in unsupervised learning, there is not yet a consensus on a formal theory, and the vast majority of work in this direction has focused on unsupervised clustering. We study a recently proposed framework for supervised clustering where there is access to a teacher. We give an improved generic algorithm to cluster any concept class in that model. Our algorithm is query-efficient in the sense that it involves only a small amount of interaction with the teacher. We also present and study two natural generalizations of the model. The model assumes that the teacher response to the algorithm is perfect. We eliminate this limitation by proposing a noisy model and give an algorithm for clustering the class of intervals in this noisy model. We also propose a dynamic model where the teacher sees a random subset of the points. Finally, for datasets satisfying a spectrum of weak to strong properties, we give query bounds, and show that a class of clustering functions containing Single-Linkage will find the target clustering under the strongest property.
Reference: text
sentIndex sentText sentNum sentScore
1 The model assumes that the teacher response to the algorithm is perfect. [sent-9, score-0.506]
2 We eliminate this limitation by proposing a noisy model and give an algorithm for clustering the class of intervals in this noisy model. [sent-10, score-0.615]
3 We also propose a dynamic model where the teacher sees a random subset of the points. [sent-11, score-0.496]
4 Finally, for datasets satisfying a spectrum of weak to strong properties, we give query bounds, and show that a class of clustering functions containing Single-Linkage will find the target clustering under the strongest property. [sent-12, score-0.963]
5 In this case, an algorithm can interact with the teacher to aid in clustering the documents without asking too much of the teacher. [sent-19, score-0.871]
6 In fact, there might be no principled way to reach the target clustering which a teacher has in mind without actually interacting with him/her. [sent-26, score-0.944]
7 Or perhaps the teacher would like these articles to be clustered into {news articles} vs. [sent-31, score-0.548]
8 We assume that the given set S of m points belongs to k target clusters {c1 , c2 , . [sent-37, score-0.426]
9 In a realistic scenario, the teacher can look at the clustering proposed and give some limited feedback. [sent-51, score-0.781]
10 Hence, the model in [BB08] considers the following feedback: If there is a cluster hi which contains points from two or more target clusters, then the teacher can ask the algorithm to split that cluster by issuing the request split(hi ). [sent-52, score-1.688]
11 Note that the teacher does not specify how the cluster hi should be split. [sent-53, score-0.903]
12 If there are clusters hi and hj such that hi ∪ hj is a subset of one of the target clusters, then the teacher can ask the algorithm to merge these two clusters by issuing the request merge(hi , hj ). [sent-54, score-2.714]
13 Notice, that if we allow the algorithm to use the number of queries linear in m, then there is a trivial algorithm, which starts with all the points in separate clusters and then merges clusters as requested by the teacher. [sent-56, score-0.606]
14 One could also imagine applying this split-merge framework to cases where the optimal clustering does not necessarily belong to a natural concept class, but instead satisfies some natural separation conditions (ex. [sent-57, score-0.512]
15 1 Contributions In their paper, Balcan and Blum [BB08] gave efficient clustering algorithms for the class of intervals and the class of disjunctions over {0, 1}n. [sent-61, score-0.64]
16 We extend those results by constructing an algorithm for clustering the class of axis parallel rectangles in d dimensions. [sent-62, score-0.478]
17 In the original model the teacher is only allowed to merge two clusters hi and hj if hi ∪ hj is a subset of one of the target clusters. [sent-69, score-2.127]
18 We consider a noise tolerant version of this in which the teacher can ask the algorithm to merge hi and hj if both the clusters have at least some fixed fraction of points belonging to the same target cluster. [sent-70, score-1.791]
19 This is a more natural model since we allow for the teacher requests to be imperfect. [sent-71, score-0.639]
20 In the original model we assume that the teacher has access to all the points. [sent-72, score-0.529]
21 In practice, we are interested in clustering a large domain of points and the teacher might only have access to a random subset of these points at every step. [sent-73, score-1.027]
22 For example, in the case of clustering news documents, our goal is to figure out the target clustering which reflects the teacher preferences. [sent-74, score-1.279]
23 But the teacher sees a small fresh set of news articles very day. [sent-75, score-0.654]
24 We propose a model which takes into account the fact that at each step the split and merge requests might be on a different set of points. [sent-76, score-0.6]
25 In both the above models the straight forward algorithm for clustering the class of intervals fails. [sent-77, score-0.591]
26 We develop new algorithms for clustering intervals in both the models. [sent-78, score-0.496]
27 Along the way, we also show that a class of clustering functions containing Single-Linkage will find the target clustering under the strict threshold property (Theorem 6. [sent-80, score-0.93]
28 The goal of the algorithm is to figure out the correct clustering by interacting with the teacher as follows: 1. [sent-86, score-0.819]
29 The teacher can request split(hi ) if hi contains points from two or more target clusters. [sent-92, score-1.211]
30 The teacher can request merge(hi , hj ) if hi ∪ hj is a subset of one of the target clusters. [sent-93, score-1.497]
31 The assumption is that there is no noise in the teacher response. [sent-94, score-0.467]
32 The goal is to use as few queries to the teacher as possible. [sent-95, score-0.558]
33 1 A generic algorithm for learning any finite concept class We reduce the query complexity of the generic algorithm for learning any concept class [BB08], from O(k 3 log |C|) to O(k log |C|). [sent-98, score-0.607]
34 Given a set h ⊆ S of points we say that a given clustering R is consistent with h if h appears as a subset of one of the clusters in R. [sent-103, score-0.575]
35 If the teacher says split(hi ), remove all the clusterings in V S which are consistent with hi If the teacher says merge(hi , hj ) , remove all the clusterings in V S which are inconsistent with hi ∪ hj . [sent-121, score-2.213]
36 At each request, if the teacher says split(hi ), then all the clusterings consistent with hi are removed, which by the construction followed by the algorithm will be at least half of |V S|. [sent-126, score-0.994]
37 If the teacher says merge(hi , hj ), i < j, then all the clusterings inconsistent with hi ∪ hj are removed. [sent-127, score-1.29]
38 This set will be at least half of |V S|, since otherwise the number of clusterings consistent with hi ∪ hj will be more than half of |V S| which contradicts the maximality of hi . [sent-128, score-0.972]
39 3 Clustering geometric concepts We now present an algorithm for clustering the class of rectangles in 2 dimensions. [sent-136, score-0.504]
40 1 An algorithm for clustering rectangles Each rectangle c in the target clustering can be described by four points (ai , aj ), (bi , bj ) such that (x, y) ∈ ck iff ai < x < aj and bi < y < bj . [sent-142, score-1.324]
41 On a merge request, simply merge the two clusters. [sent-161, score-0.584]
42 On a split of (ai , aj ), (bi , bj ), create a new point ar such that ai < ar < aj , and the projection of all the points onto (ai , aj ) is divided into half by ar . [sent-163, score-0.717]
43 If the teacher says split on (xi , xj ), (yi , yj ), then we know that either (xi , xj ) contains a target point a or (yi , yj ) contains a target point b or both. [sent-171, score-1.021]
44 There are at most 2k intervals on the x-axis and at most 2k intervals on the y-axis. [sent-173, score-0.412]
45 Between any two split requests the total number of merge requests will be at most the total number of regions which is ≤ O((k log m)2 ). [sent-176, score-0.891]
46 Since, t points on the x and the y axis can create at most t2 regions, we get that the total number of merge requests is at most ≤ O(k log m)3 . [sent-177, score-0.676]
47 In the original model we assume that the teacher has access to the entire set of points. [sent-192, score-0.529]
48 For example, in the case of clustering news articles, each day the teacher sees a small fresh set of articles and provides feedback. [sent-194, score-0.944]
49 Based on this the algorithm must be able to figure out the target clustering for the entire space of articles. [sent-195, score-0.493]
50 There is a target k clustering for these points, where cluster corresponds to a concept in a concept class C. [sent-197, score-0.807]
51 At each step, the world picks m points and the algorithm clusters these m points and presents the clustering to the teacher. [sent-198, score-0.695]
52 If the teacher is unhappy with the clustering he may provide feedback. [sent-199, score-0.757]
53 Note that 1 Proof is omitted due to space constraints 4 the teacher need not provide feedback every time the algorithm proposes an incorrect clustering. [sent-200, score-0.604]
54 1 An algorithm for clustering intervals We assume that the space X is discretized into n points. [sent-206, score-0.535]
55 , ak+1 }, on the x-axis such that the target clustering is the intervals {[a1 , a2 ], [a2 , a3 ], . [sent-210, score-0.66]
56 An interval is marked if we know that none of the points(ai ’s) in the target clustering can be present in that interval. [sent-217, score-0.632]
57 Start with one unmarked interval containing all the points in the space. [sent-219, score-0.439]
58 Next output the final clusters as follows: • set i=1 • If hi and hi+1 correspond to adjacent intervals at least one of them is unmarked, then output hi ∪ hi+1 and set i = i + 2. [sent-225, score-0.994]
59 On a split request, split every unmarked interval in the cluster in half. [sent-228, score-0.731]
60 On a merge request, mark every unmarked contained in the cluster. [sent-230, score-0.534]
61 The algorithm can cluster the class of intervals using at most O(k log n) mistakes. [sent-233, score-0.471]
62 For every point ai in the target clustering we define two variables lef t size(ai ) and right size(ai ). [sent-237, score-0.641]
63 If ai is inside a hypothesis interval [x, y] then lef t size(ai ) = number of points in [x, ai ] and right size(ai ) = number of points in [ai , y]. [sent-238, score-0.645]
64 If ai is also a boundary point in the hypothesis clustering ([x, ai ], [ai , y]) then again lef t size(ai ) = number of points in [x, ai ] and right size(ai ) = number of points in [ai , y]. [sent-239, score-0.903]
65 Since there are at most k boundary points in the target clustering, the total number of split requests is ≤ O(k log n) times. [sent-241, score-0.66]
66 Also note that the number of unmarked intervals is at most O(k log n) since, unmarked intervals increase only via split requests. [sent-242, score-1.029]
67 On every merge request either an unmarked interval is marked or two marked intervals are merged. [sent-243, score-1.097]
68 Hence, the total number of merge requests is atmost twice the number of unmarked intervals ≤ O(k log n). [sent-244, score-0.97]
69 5 η noise model The previous two models assume that there is no noise in the teacher requests. [sent-248, score-0.467]
70 This is again an unrealistic assumption since we cannot expect the teacher responses to be perfect. [sent-249, score-0.467]
71 For example, if the algorithm proposes a clustering in which there are two clusters which are almost pure,i. [sent-250, score-0.517]
72 , a large fraction of the points in both the clusters belong to the same target clusters, then there is a good chance that the teacher will ask the algorithm to merge these two clusters, especially if the teacher has access to the clusters through a random subset of the points. [sent-252, score-1.97]
73 , hJ } to the teacher and the teacher provides feedback. [sent-259, score-0.934]
74 Split: As before the teacher can say split(hi ), if hi contains points from more than one target clusters. [sent-261, score-1.05]
75 Merge: The teacher can say merge(hi , hj ), if hi and hj each have at least one point from some target cluster. [sent-263, score-1.336]
76 Any clustering algorithm must use Ω(m) queries in the worst case to figure out the target clustering in the noisy model. [sent-269, score-0.874]
77 If the teacher says merge(hi , hj ) then we assume that at least a constant η fraction of the points in both the clusters, belong to a single target cluster. [sent-271, score-1.046]
78 1 An algorithm for clustering intervals The algorithm is a generalization of the interval learning algorithm in the original model. [sent-274, score-0.735]
79 The main idea is that when the teacher asks to merge two intervals (ai , aj ) and (aj , ak ), then we know than at least η fraction of the portion to the left and the right of aj is pure. [sent-275, score-1.193]
80 As the algorithm proceeds it is going to mark certain intervals as “pure” which means that all the points in that interval belong to the same cluster. [sent-277, score-0.473]
81 At each step, cluster the points using the current set of intervals and present that clustering to the teacher. [sent-281, score-0.721]
82 On a merge request • If both the intervals are marked “pure”, merge them. [sent-285, score-1.001]
83 • If both the intervals are unmarked, then create 3 intervals where the middle interval contains η fraction of the two intervals. [sent-286, score-0.562]
84 • If one interval is marked and one is unmarked, then shift the boundary between the two intervals towards the unmarked interval by a fraction of η. [sent-288, score-0.694]
85 The algorithm clusters the class of intervals using at most O(k(log 1−η m)2 ). [sent-291, score-0.459]
86 Notice that every split and impure merge request makes progress, i. [sent-297, score-0.681]
87 Hence, the total number of split + impure merge requests ≤ k log 1−η m. [sent-300, score-0.75]
88 1 We also know that the total number of unmarked intervals ≤ k log 1−η m, since only split requests increase the unmarked intervals. [sent-301, score-1.062]
89 Also, total number of marked intervals ≤ total number of unmarked intervals, since every marked interval can be charged to a split request. [sent-302, score-0.85]
90 To bound the total number of pure merges, notice that every time a pure merge is made, the size of some interval decreases by at least an η fraction. [sent-304, score-0.619]
91 6 Properties of the Data We now adapt the query framework of [BB08] to cluster datasets which satisfy certain natural separation conditions with respect to the target partitioning. [sent-308, score-0.466]
92 Note that since Single-linkage is Consistent, k-Rich, and Order-Consistent [ZBD09], it immediately follows that SL(d, k) = Γ - in other words, SL is guaranteed to find the target k-partitioning, but we still have to interact with the teacher to find out k. [sent-352, score-0.665]
93 Given a dataset satisfying Strict Threshold Separation, there exists an algorithm which can find the target partitioning for any hypothesis class in O(log(n)) queries Proof. [sent-357, score-0.491]
94 We start with some candidate k, and if the teacher tells us to split anything, we know the number of clusters must be larger, and if we are told to merge, we know the number of clusters must be smaller. [sent-362, score-0.983]
95 Given a dataset satisfying Strict Separation, there exists an algorithm which can find the target partitioning for any hypothesis class in O(k) queries Proof. [sent-375, score-0.491]
96 Our algorithm will start by presenting the teacher with all points in a single cluster. [sent-379, score-0.61]
97 There can be no merge requests since we always split perfectly. [sent-381, score-0.6]
98 Given a dataset satisfying γ-margin Separation, there exists an algorithm which can √ find the target partitioning for any hypothesis class in O(( γd )d − k) queries γ Proof. [sent-389, score-0.491]
99 Since all hypercubes are pure, we can only get merge requests, of which there can be √ at most O(( γd )d − k) until the target partitioning is found. [sent-397, score-0.584]
100 For datasets satisfying a spectrum of weak to strong properties, we gave query bounds, and showed that a class of clustering functions containing Single-Linkage will find the target clustering under the strongest property. [sent-400, score-0.971]
wordName wordTfidf (topN-words)
[('teacher', 0.467), ('hi', 0.315), ('merge', 0.292), ('clustering', 0.29), ('unmarked', 0.216), ('intervals', 0.206), ('hj', 0.195), ('requests', 0.172), ('target', 0.164), ('request', 0.161), ('clusters', 0.158), ('split', 0.136), ('cluster', 0.121), ('separation', 0.106), ('points', 0.104), ('ai', 0.103), ('interval', 0.096), ('rectangles', 0.093), ('queries', 0.091), ('concept', 0.088), ('strict', 0.077), ('query', 0.075), ('balcan', 0.075), ('aj', 0.068), ('news', 0.068), ('sl', 0.067), ('hypercubes', 0.066), ('impure', 0.066), ('pure', 0.065), ('partitioning', 0.062), ('clusterings', 0.06), ('hypercube', 0.059), ('says', 0.058), ('lef', 0.058), ('merges', 0.056), ('class', 0.056), ('articles', 0.055), ('blum', 0.055), ('poly', 0.053), ('marked', 0.05), ('log', 0.049), ('eparation', 0.049), ('reza', 0.043), ('bosagh', 0.043), ('zadeh', 0.043), ('feedback', 0.042), ('documents', 0.041), ('satisfying', 0.041), ('notice', 0.04), ('algorithm', 0.039), ('bj', 0.039), ('inside', 0.039), ('distances', 0.039), ('hypothesis', 0.038), ('access', 0.036), ('fresh', 0.035), ('total', 0.035), ('generic', 0.034), ('interact', 0.034), ('aend', 0.033), ('astart', 0.033), ('issuing', 0.033), ('trict', 0.033), ('zbd', 0.033), ('know', 0.032), ('theorem', 0.032), ('gave', 0.032), ('half', 0.032), ('fraction', 0.03), ('outer', 0.03), ('proposes', 0.03), ('ak', 0.03), ('threshold', 0.03), ('sees', 0.029), ('entertainment', 0.029), ('ackerman', 0.029), ('politics', 0.029), ('hyperplanes', 0.028), ('br', 0.028), ('belong', 0.028), ('bi', 0.027), ('edges', 0.027), ('ask', 0.027), ('hence', 0.027), ('concepts', 0.026), ('original', 0.026), ('clustered', 0.026), ('lets', 0.026), ('eq', 0.026), ('every', 0.026), ('distance', 0.025), ('ar', 0.025), ('diameter', 0.025), ('create', 0.024), ('give', 0.024), ('sketched', 0.023), ('avrim', 0.023), ('interacting', 0.023), ('containing', 0.023), ('consistent', 0.023)]
simIndex simValue paperId paperTitle
same-paper 1 0.99999994 261 nips-2010-Supervised Clustering
Author: Pranjal Awasthi, Reza B. Zadeh
Abstract: Despite the ubiquity of clustering as a tool in unsupervised learning, there is not yet a consensus on a formal theory, and the vast majority of work in this direction has focused on unsupervised clustering. We study a recently proposed framework for supervised clustering where there is access to a teacher. We give an improved generic algorithm to cluster any concept class in that model. Our algorithm is query-efficient in the sense that it involves only a small amount of interaction with the teacher. We also present and study two natural generalizations of the model. The model assumes that the teacher response to the algorithm is perfect. We eliminate this limitation by proposing a noisy model and give an algorithm for clustering the class of intervals in this noisy model. We also propose a dynamic model where the teacher sees a random subset of the points. Finally, for datasets satisfying a spectrum of weak to strong properties, we give query bounds, and show that a class of clustering functions containing Single-Linkage will find the target clustering under the strongest property.
2 0.23737752 273 nips-2010-Towards Property-Based Classification of Clustering Paradigms
Author: Margareta Ackerman, Shai Ben-David, David Loker
Abstract: Clustering is a basic data mining task with a wide variety of applications. Not surprisingly, there exist many clustering algorithms. However, clustering is an ill defined problem - given a data set, it is not clear what a “correct” clustering for that set is. Indeed, different algorithms may yield dramatically different outputs for the same input sets. Faced with a concrete clustering task, a user needs to choose an appropriate clustering algorithm. Currently, such decisions are often made in a very ad hoc, if not completely random, manner. Given the crucial effect of the choice of a clustering algorithm on the resulting clustering, this state of affairs is truly regrettable. In this paper we address the major research challenge of developing tools for helping users make more informed decisions when they come to pick a clustering tool for their data. This is, of course, a very ambitious endeavor, and in this paper, we make some first steps towards this goal. We propose to address this problem by distilling abstract properties of the input-output behavior of different clustering paradigms. In this paper, we demonstrate how abstract, intuitive properties of clustering functions can be used to taxonomize a set of popular clustering algorithmic paradigms. On top of addressing deterministic clustering algorithms, we also propose similar properties for randomized algorithms and use them to highlight functional differences between different common implementations of k-means clustering. We also study relationships between the properties, independent of any particular algorithm. In particular, we strengthen Kleinberg’s famous impossibility result, while providing a simpler proof. 1
3 0.15829365 230 nips-2010-Robust Clustering as Ensembles of Affinity Relations
Author: Hairong Liu, Longin J. Latecki, Shuicheng Yan
Abstract: In this paper, we regard clustering as ensembles of k-ary affinity relations and clusters correspond to subsets of objects with maximal average affinity relations. The average affinity relation of a cluster is relaxed and well approximated by a constrained homogenous function. We present an efficient procedure to solve this optimization problem, and show that the underlying clusters can be robustly revealed by using priors systematically constructed from the data. Our method can automatically select some points to form clusters, leaving other points un-grouped; thus it is inherently robust to large numbers of outliers, which has seriously limited the applicability of classical methods. Our method also provides a unified solution to clustering from k-ary affinity relations with k ≥ 2, that is, it applies to both graph-based and hypergraph-based clustering problems. Both theoretical analysis and experimental results show the superiority of our method over classical solutions to the clustering problem, especially when there exists a large number of outliers.
4 0.12000661 223 nips-2010-Rates of convergence for the cluster tree
Author: Kamalika Chaudhuri, Sanjoy Dasgupta
Abstract: For a density f on Rd , a high-density cluster is any connected component of {x : f (x) ≥ λ}, for some λ > 0. The set of all high-density clusters form a hierarchy called the cluster tree of f . We present a procedure for estimating the cluster tree given samples from f . We give finite-sample convergence rates for our algorithm, as well as lower bounds on the sample complexity of this estimation problem. 1
5 0.11735377 180 nips-2010-Near-Optimal Bayesian Active Learning with Noisy Observations
Author: Daniel Golovin, Andreas Krause, Debajyoti Ray
Abstract: We tackle the fundamental problem of Bayesian active learning with noise, where we need to adaptively select from a number of expensive tests in order to identify an unknown hypothesis sampled from a known prior distribution. In the case of noise–free observations, a greedy algorithm called generalized binary search (GBS) is known to perform near–optimally. We show that if the observations are noisy, perhaps surprisingly, GBS can perform very poorly. We develop EC2 , a novel, greedy active learning algorithm and prove that it is competitive with the optimal policy, thus obtaining the first competitiveness guarantees for Bayesian active learning with noisy observations. Our bounds rely on a recently discovered diminishing returns property called adaptive submodularity, generalizing the classical notion of submodular set functions to adaptive policies. Our results hold even if the tests have non–uniform cost and their noise is correlated. We also propose E FF ECXTIVE , a particularly fast approximation of EC 2 , and evaluate it on a Bayesian experimental design problem involving human subjects, intended to tease apart competing economic theories of how people make decisions under uncertainty. 1
6 0.10535743 67 nips-2010-Dynamic Infinite Relational Model for Time-varying Relational Data Analysis
7 0.090289958 166 nips-2010-Minimum Average Cost Clustering
8 0.088453121 235 nips-2010-Self-Paced Learning for Latent Variable Models
9 0.087209448 173 nips-2010-Multi-View Active Learning in the Non-Realizable Case
10 0.08035551 221 nips-2010-Random Projections for $k$-means Clustering
11 0.07604599 210 nips-2010-Practical Large-Scale Optimization for Max-norm Regularization
12 0.075127542 155 nips-2010-Learning the context of a category
13 0.074323803 130 nips-2010-Interval Estimation for Reinforcement-Learning Algorithms in Continuous-State Domains
14 0.07384979 277 nips-2010-Two-Layer Generalization Analysis for Ranking Using Rademacher Average
15 0.072218873 62 nips-2010-Discriminative Clustering by Regularized Information Maximization
16 0.071676679 70 nips-2010-Efficient Optimization for Discriminative Latent Class Models
17 0.071466558 274 nips-2010-Trading off Mistakes and Don't-Know Predictions
18 0.062824719 191 nips-2010-On the Theory of Learnining with Privileged Information
19 0.062604696 40 nips-2010-Beyond Actions: Discriminative Models for Contextual Group Activities
20 0.06081643 31 nips-2010-An analysis on negative curvature induced by singularity in multi-layer neural-network learning
topicId topicWeight
[(0, 0.149), (1, 0.029), (2, 0.078), (3, 0.012), (4, -0.061), (5, 0.001), (6, -0.046), (7, -0.065), (8, 0.176), (9, -0.009), (10, 0.132), (11, -0.057), (12, 0.016), (13, -0.223), (14, 0.254), (15, -0.079), (16, 0.034), (17, 0.078), (18, -0.072), (19, -0.044), (20, 0.109), (21, 0.1), (22, 0.016), (23, 0.066), (24, -0.126), (25, -0.02), (26, -0.055), (27, 0.021), (28, 0.104), (29, -0.002), (30, 0.091), (31, 0.021), (32, 0.068), (33, 0.035), (34, -0.014), (35, 0.075), (36, 0.028), (37, 0.068), (38, -0.062), (39, -0.068), (40, -0.047), (41, 0.07), (42, -0.054), (43, 0.06), (44, -0.032), (45, 0.06), (46, -0.072), (47, -0.013), (48, -0.005), (49, -0.057)]
simIndex simValue paperId paperTitle
same-paper 1 0.97210586 261 nips-2010-Supervised Clustering
Author: Pranjal Awasthi, Reza B. Zadeh
Abstract: Despite the ubiquity of clustering as a tool in unsupervised learning, there is not yet a consensus on a formal theory, and the vast majority of work in this direction has focused on unsupervised clustering. We study a recently proposed framework for supervised clustering where there is access to a teacher. We give an improved generic algorithm to cluster any concept class in that model. Our algorithm is query-efficient in the sense that it involves only a small amount of interaction with the teacher. We also present and study two natural generalizations of the model. The model assumes that the teacher response to the algorithm is perfect. We eliminate this limitation by proposing a noisy model and give an algorithm for clustering the class of intervals in this noisy model. We also propose a dynamic model where the teacher sees a random subset of the points. Finally, for datasets satisfying a spectrum of weak to strong properties, we give query bounds, and show that a class of clustering functions containing Single-Linkage will find the target clustering under the strongest property.
2 0.8187893 273 nips-2010-Towards Property-Based Classification of Clustering Paradigms
Author: Margareta Ackerman, Shai Ben-David, David Loker
Abstract: Clustering is a basic data mining task with a wide variety of applications. Not surprisingly, there exist many clustering algorithms. However, clustering is an ill defined problem - given a data set, it is not clear what a “correct” clustering for that set is. Indeed, different algorithms may yield dramatically different outputs for the same input sets. Faced with a concrete clustering task, a user needs to choose an appropriate clustering algorithm. Currently, such decisions are often made in a very ad hoc, if not completely random, manner. Given the crucial effect of the choice of a clustering algorithm on the resulting clustering, this state of affairs is truly regrettable. In this paper we address the major research challenge of developing tools for helping users make more informed decisions when they come to pick a clustering tool for their data. This is, of course, a very ambitious endeavor, and in this paper, we make some first steps towards this goal. We propose to address this problem by distilling abstract properties of the input-output behavior of different clustering paradigms. In this paper, we demonstrate how abstract, intuitive properties of clustering functions can be used to taxonomize a set of popular clustering algorithmic paradigms. On top of addressing deterministic clustering algorithms, we also propose similar properties for randomized algorithms and use them to highlight functional differences between different common implementations of k-means clustering. We also study relationships between the properties, independent of any particular algorithm. In particular, we strengthen Kleinberg’s famous impossibility result, while providing a simpler proof. 1
3 0.74019754 230 nips-2010-Robust Clustering as Ensembles of Affinity Relations
Author: Hairong Liu, Longin J. Latecki, Shuicheng Yan
Abstract: In this paper, we regard clustering as ensembles of k-ary affinity relations and clusters correspond to subsets of objects with maximal average affinity relations. The average affinity relation of a cluster is relaxed and well approximated by a constrained homogenous function. We present an efficient procedure to solve this optimization problem, and show that the underlying clusters can be robustly revealed by using priors systematically constructed from the data. Our method can automatically select some points to form clusters, leaving other points un-grouped; thus it is inherently robust to large numbers of outliers, which has seriously limited the applicability of classical methods. Our method also provides a unified solution to clustering from k-ary affinity relations with k ≥ 2, that is, it applies to both graph-based and hypergraph-based clustering problems. Both theoretical analysis and experimental results show the superiority of our method over classical solutions to the clustering problem, especially when there exists a large number of outliers.
4 0.58442652 223 nips-2010-Rates of convergence for the cluster tree
Author: Kamalika Chaudhuri, Sanjoy Dasgupta
Abstract: For a density f on Rd , a high-density cluster is any connected component of {x : f (x) ≥ λ}, for some λ > 0. The set of all high-density clusters form a hierarchy called the cluster tree of f . We present a procedure for estimating the cluster tree given samples from f . We give finite-sample convergence rates for our algorithm, as well as lower bounds on the sample complexity of this estimation problem. 1
5 0.568672 62 nips-2010-Discriminative Clustering by Regularized Information Maximization
Author: Andreas Krause, Pietro Perona, Ryan G. Gomes
Abstract: Is there a principled way to learn a probabilistic discriminative classifier from an unlabeled data set? We present a framework that simultaneously clusters the data and trains a discriminative classifier. We call it Regularized Information Maximization (RIM). RIM optimizes an intuitive information-theoretic objective function which balances class separation, class balance and classifier complexity. The approach can flexibly incorporate different likelihood functions, express prior assumptions about the relative size of different classes and incorporate partial labels for semi-supervised learning. In particular, we instantiate the framework to unsupervised, multi-class kernelized logistic regression. Our empirical evaluation indicates that RIM outperforms existing methods on several real data sets, and demonstrates that RIM is an effective model selection method. 1
6 0.5284121 67 nips-2010-Dynamic Infinite Relational Model for Time-varying Relational Data Analysis
7 0.47238639 221 nips-2010-Random Projections for $k$-means Clustering
8 0.44074884 155 nips-2010-Learning the context of a category
9 0.43764773 166 nips-2010-Minimum Average Cost Clustering
10 0.43334961 274 nips-2010-Trading off Mistakes and Don't-Know Predictions
12 0.37110573 210 nips-2010-Practical Large-Scale Optimization for Max-norm Regularization
13 0.36691219 31 nips-2010-An analysis on negative curvature induced by singularity in multi-layer neural-network learning
14 0.36244008 2 nips-2010-A Bayesian Approach to Concept Drift
15 0.34952548 24 nips-2010-Active Learning Applied to Patient-Adaptive Heartbeat Classification
16 0.34807378 70 nips-2010-Efficient Optimization for Discriminative Latent Class Models
17 0.33449951 116 nips-2010-Identifying Patients at Risk of Major Adverse Cardiovascular Events Using Symbolic Mismatch
18 0.31912568 55 nips-2010-Cross Species Expression Analysis using a Dirichlet Process Mixture Model with Latent Matchings
19 0.31803048 47 nips-2010-Co-regularization Based Semi-supervised Domain Adaptation
20 0.31613833 198 nips-2010-Optimal Web-Scale Tiering as a Flow Problem
topicId topicWeight
[(13, 0.326), (27, 0.059), (30, 0.072), (45, 0.202), (50, 0.021), (52, 0.019), (60, 0.07), (77, 0.038), (78, 0.047), (90, 0.025), (95, 0.015)]
simIndex simValue paperId paperTitle
1 0.97011536 259 nips-2010-Subgraph Detection Using Eigenvector L1 Norms
Author: Benjamin Miller, Nadya Bliss, Patrick J. Wolfe
Abstract: When working with network datasets, the theoretical framework of detection theory for Euclidean vector spaces no longer applies. Nevertheless, it is desirable to determine the detectability of small, anomalous graphs embedded into background networks with known statistical properties. Casting the problem of subgraph detection in a signal processing context, this article provides a framework and empirical results that elucidate a “detection theory” for graph-valued data. Its focus is the detection of anomalies in unweighted, undirected graphs through L1 properties of the eigenvectors of the graph’s so-called modularity matrix. This metric is observed to have relatively low variance for certain categories of randomly-generated graphs, and to reveal the presence of an anomalous subgraph with reasonable reliability when the anomaly is not well-correlated with stronger portions of the background graph. An analysis of subgraphs in real network datasets confirms the efficacy of this approach. 1
2 0.95575714 192 nips-2010-Online Classification with Specificity Constraints
Author: Andrey Bernstein, Shie Mannor, Nahum Shimkin
Abstract: We consider the online binary classification problem, where we are given m classifiers. At each stage, the classifiers map the input to the probability that the input belongs to the positive class. An online classification meta-algorithm is an algorithm that combines the outputs of the classifiers in order to attain a certain goal, without having prior knowledge on the form and statistics of the input, and without prior knowledge on the performance of the given classifiers. In this paper, we use sensitivity and specificity as the performance metrics of the meta-algorithm. In particular, our goal is to design an algorithm that satisfies the following two properties (asymptotically): (i) its average false positive rate (fp-rate) is under some given threshold; and (ii) its average true positive rate (tp-rate) is not worse than the tp-rate of the best convex combination of the m given classifiers that satisfies fprate constraint, in hindsight. We show that this problem is in fact a special case of the regret minimization problem with constraints, and therefore the above goal is not attainable. Hence, we pose a relaxed goal and propose a corresponding practical online learning meta-algorithm that attains it. In the case of two classifiers, we show that this algorithm takes a very simple form. To our best knowledge, this is the first algorithm that addresses the problem of the average tp-rate maximization under average fp-rate constraints in the online setting. 1
3 0.93334275 45 nips-2010-CUR from a Sparse Optimization Viewpoint
Author: Jacob Bien, Ya Xu, Michael W. Mahoney
Abstract: The CUR decomposition provides an approximation of a matrix X that has low reconstruction error and that is sparse in the sense that the resulting approximation lies in the span of only a few columns of X. In this regard, it appears to be similar to many sparse PCA methods. However, CUR takes a randomized algorithmic approach, whereas most sparse PCA methods are framed as convex optimization problems. In this paper, we try to understand CUR from a sparse optimization viewpoint. We show that CUR is implicitly optimizing a sparse regression objective and, furthermore, cannot be directly cast as a sparse PCA method. We also observe that the sparsity attained by CUR possesses an interesting structure, which leads us to formulate a sparse PCA method that achieves a CUR-like sparsity.
same-paper 4 0.92136568 261 nips-2010-Supervised Clustering
Author: Pranjal Awasthi, Reza B. Zadeh
Abstract: Despite the ubiquity of clustering as a tool in unsupervised learning, there is not yet a consensus on a formal theory, and the vast majority of work in this direction has focused on unsupervised clustering. We study a recently proposed framework for supervised clustering where there is access to a teacher. We give an improved generic algorithm to cluster any concept class in that model. Our algorithm is query-efficient in the sense that it involves only a small amount of interaction with the teacher. We also present and study two natural generalizations of the model. The model assumes that the teacher response to the algorithm is perfect. We eliminate this limitation by proposing a noisy model and give an algorithm for clustering the class of intervals in this noisy model. We also propose a dynamic model where the teacher sees a random subset of the points. Finally, for datasets satisfying a spectrum of weak to strong properties, we give query bounds, and show that a class of clustering functions containing Single-Linkage will find the target clustering under the strongest property.
5 0.9192977 221 nips-2010-Random Projections for $k$-means Clustering
Author: Christos Boutsidis, Anastasios Zouzias, Petros Drineas
Abstract: This paper discusses the topic of dimensionality reduction for k-means clustering. We prove that any set of n points in d dimensions (rows in a matrix A ∈ Rn×d ) can be projected into t = Ω(k/ε2 ) dimensions, for any ε ∈ (0, 1/3), in O(nd⌈ε−2 k/ log(d)⌉) time, such that with constant probability the optimal k-partition of the point set is preserved within a factor of 2 + √ The projection is done ε. √ by post-multiplying A with a d × t random matrix R having entries +1/ t or −1/ t with equal probability. A numerical implementation of our technique and experiments on a large face images dataset verify the speed and the accuracy of our theoretical results.
6 0.91224128 146 nips-2010-Learning Multiple Tasks using Manifold Regularization
7 0.9090184 284 nips-2010-Variational bounds for mixed-data factor analysis
8 0.83106917 262 nips-2010-Switched Latent Force Models for Movement Segmentation
9 0.82222044 136 nips-2010-Large-Scale Matrix Factorization with Missing Data under Additional Constraints
10 0.78204376 210 nips-2010-Practical Large-Scale Optimization for Max-norm Regularization
11 0.77680182 89 nips-2010-Factorized Latent Spaces with Structured Sparsity
12 0.76711434 226 nips-2010-Repeated Games against Budgeted Adversaries
13 0.76071453 110 nips-2010-Guaranteed Rank Minimization via Singular Value Projection
14 0.75969291 166 nips-2010-Minimum Average Cost Clustering
15 0.75788695 196 nips-2010-Online Markov Decision Processes under Bandit Feedback
16 0.75715786 222 nips-2010-Random Walk Approach to Regret Minimization
17 0.75529838 195 nips-2010-Online Learning in The Manifold of Low-Rank Matrices
18 0.75350374 30 nips-2010-An Inverse Power Method for Nonlinear Eigenproblems with Applications in 1-Spectral Clustering and Sparse PCA
19 0.75026882 182 nips-2010-New Adaptive Algorithms for Online Classification
20 0.74963742 70 nips-2010-Efficient Optimization for Discriminative Latent Class Models