nips nips2010 nips2010-273 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Margareta Ackerman, Shai Ben-David, David Loker
Abstract: Clustering is a basic data mining task with a wide variety of applications. Not surprisingly, there exist many clustering algorithms. However, clustering is an ill defined problem - given a data set, it is not clear what a “correct” clustering for that set is. Indeed, different algorithms may yield dramatically different outputs for the same input sets. Faced with a concrete clustering task, a user needs to choose an appropriate clustering algorithm. Currently, such decisions are often made in a very ad hoc, if not completely random, manner. Given the crucial effect of the choice of a clustering algorithm on the resulting clustering, this state of affairs is truly regrettable. In this paper we address the major research challenge of developing tools for helping users make more informed decisions when they come to pick a clustering tool for their data. This is, of course, a very ambitious endeavor, and in this paper, we make some first steps towards this goal. We propose to address this problem by distilling abstract properties of the input-output behavior of different clustering paradigms. In this paper, we demonstrate how abstract, intuitive properties of clustering functions can be used to taxonomize a set of popular clustering algorithmic paradigms. On top of addressing deterministic clustering algorithms, we also propose similar properties for randomized algorithms and use them to highlight functional differences between different common implementations of k-means clustering. We also study relationships between the properties, independent of any particular algorithm. In particular, we strengthen Kleinberg’s famous impossibility result, while providing a simpler proof. 1
Reference: text
sentIndex sentText sentNum sentScore
1 However, clustering is an ill defined problem - given a data set, it is not clear what a “correct” clustering for that set is. [sent-8, score-1.192]
2 Faced with a concrete clustering task, a user needs to choose an appropriate clustering algorithm. [sent-10, score-1.192]
3 Given the crucial effect of the choice of a clustering algorithm on the resulting clustering, this state of affairs is truly regrettable. [sent-12, score-0.596]
4 In this paper we address the major research challenge of developing tools for helping users make more informed decisions when they come to pick a clustering tool for their data. [sent-13, score-0.64]
5 We propose to address this problem by distilling abstract properties of the input-output behavior of different clustering paradigms. [sent-15, score-0.706]
6 In this paper, we demonstrate how abstract, intuitive properties of clustering functions can be used to taxonomize a set of popular clustering algorithmic paradigms. [sent-16, score-1.351]
7 On top of addressing deterministic clustering algorithms, we also propose similar properties for randomized algorithms and use them to highlight functional differences between different common implementations of k-means clustering. [sent-17, score-0.761]
8 In particular, we strengthen Kleinberg’s famous impossibility result, while providing a simpler proof. [sent-19, score-0.256]
9 1 Introduction In spite of the wide use of clustering in many practical applications, currently, there exists no principled method to guide the selection of a clustering algorithm. [sent-20, score-1.247]
10 Of course, users are aware of the costs involved in employing different clustering algorithms (software purchasing costs, running times, memory requirements, needs for data preprocessing etc. [sent-21, score-0.659]
11 We focus on that aspect - the input-output properties of different clustering algorithms. [sent-23, score-0.684]
12 The choice of an appropriate clustering should, of course, be task dependent. [sent-24, score-0.596]
13 A clustering that works well for one task may be unsuitable for another. [sent-25, score-0.596]
14 While some domain knowledge is embedded in the choice of similarity between domain elements (or the embedding of these elements into some Euclidean space), there is still a large variance in the behavior of difference clustering paradigms over a fixed similarity measure. [sent-27, score-0.874]
15 1 For some clustering tasks, there is a natural clustering objective function that one may wish to optimize (like k-means for vector quantization coding tasks), but very often the task does not readily translate into a corresponding objective function. [sent-28, score-1.302]
16 Many (if not most) common clustering paradigms do not optimize any clearly defined objective utility, either because no such objective is defined (like in the case of, say, single linkage clustering) or because optimizing the most relevant objective is computationally infeasible. [sent-30, score-0.854]
17 To overcome computation infeasibility, the algorithms end up carrying out a heuristic whose outcome may be quite different than the actual objective-based optimum (that is the case with the k-means algorithm as well as with spectral clustering algorithms). [sent-31, score-0.64]
18 What seems to be missing is a clear understanding of the differences in clustering outputs in terms of intuitive and usable properties. [sent-32, score-0.668]
19 Our vision is that ultimately, there would be a sufficiently rich set of properties that would provide a detailed, property-based, taxonomy of clustering methods, that could, in turn, be used as guidelines for a wide variety of clustering applications. [sent-35, score-1.476]
20 This paper takes a step towards that goal by using natural properties to examine some popular clustering approaches. [sent-37, score-0.684]
21 We present a taxonomy for common deterministic clustering functions with respect to the properties that we propose. [sent-38, score-0.888]
22 We also show how to extend this framework to the randomized clustering algorithms, and use these properties to distinguish between two k-means heuristics. [sent-39, score-0.747]
23 In particular, we strengthen Kleinberg’s impossibility result[8] using a relaxation of one of the properties that he proposed. [sent-41, score-0.315]
24 1 Previous work Our work follows a theoretical study of clustering that began with Kleinberg’s impossibility result [8], in which he proposes three candidate axioms of clustering and shows that no clustering function can simultaneously satisfy these three axioms. [sent-43, score-2.276]
25 Ackerman and Ben-David [1] subsequently showed these axioms to be consistent in the setting of clustering quality measures. [sent-44, score-0.875]
26 There are previous results that provide some property based characterizations of a specific clustering algorithm. [sent-47, score-0.642]
27 Last year, Bosagh Zadeh and Ben-David [3] characterize single-linkage within Kleinberg’s framework of clustering functions using a special invariance property (“path distance coherence”). [sent-49, score-0.871]
28 Very recently, Ackerman, Ben-David and Loker provided a characterization of the family of linkage-based clustering in terms of a few natural properties [2]. [sent-50, score-0.705]
29 Some heuristics have been proposed as a means of distinguishing between the output of clustering algorithms on specific data. [sent-51, score-0.766]
30 In particular, validity criteria can be used to evaluate the output of clustering algorithms. [sent-53, score-0.662]
31 These measures can be used to select a clustering algorithm by choosing the one that yields the highest quality clustering [10]. [sent-54, score-1.217]
32 For most of this paper, we focus on a basic subdomain where the (only) input to the clustering function is a finite set of points endowed with a between-points distance (or similarity) function, and the output is a partition of that domain. [sent-57, score-0.773]
33 A clustering of X is a k-clustering of X for some 1 ≤ k ≤ |X|. [sent-65, score-0.596]
34 i For a clustering C, let |C| denote the number of clusters in C and |Ci | denote the number of points in a cluster Ci . [sent-66, score-0.788]
35 For x, y ∈ X and a clustering C of X, we write x ∼C y if x and y belong to the same cluster in C and x ∼C y, otherwise. [sent-67, score-0.679]
36 A general clustering function is a function that takes as input a pair (X, d) and outputs a clustering of the domain X. [sent-79, score-1.341]
37 The second type are clustering functions that require that the number of clusters be provided as part of the input. [sent-80, score-0.724]
38 1 Properties of Clustering Functions A key component in our approach are properties of clustering functions that address the input-output behavior of these functions. [sent-84, score-0.75]
39 However, all the properties, with the exception of locality1 and refinement-confined, apply also for general clustering functions. [sent-86, score-0.596]
40 Isomorphism invariance: The following invariance property, proposed in [2] under the name “representation independence”, seems to be an essential part of our understanding of what clustering is. [sent-87, score-0.753]
41 A k-clustering function F is isomorphism invariant if whenever (X, d) ∼ (X , d ), then, for every k, F (X, d, k) and F (X , d , k) are isomorphic clusterings. [sent-89, score-0.308]
42 Scale invariance: Scale invariance, proposed by Kleinberg [8], requires that the output of a clustering be invariant to uniform scaling of the data. [sent-90, score-0.71]
43 Order invariance: Order invariance, proposed by Jardine and Sibson[6], describes clustering functions that are based on the ordering of pairwise distances. [sent-92, score-0.64]
44 A k-clustering function F is order invariant if whenever a distance function d over X is an order invariant modification of d, F (X, d, k) = F (X, d , k) for all k. [sent-94, score-0.287]
45 1 Locality can also be reformulated for general clustering functions, however, we do not discuss this in this work. [sent-95, score-0.616]
46 3 Locality: Intuitively, a k-clustering function is local if its behavior on a union of clusters depends only on distances between elements of that union, and is independent of the rest of the domain set. [sent-96, score-0.277]
47 A k-clustering function F is local if for any clustering C output by F and every subset of clusters, C ⊆ C, F ( C , d, |C |) = C . [sent-98, score-0.701]
48 In other words, for every domain (X, d) and number of clusters, k, if X is the union of k clusters in F (X, d, k) for some k ≤ k, then, applying F to (X , d) and asking for a k -clustering, will yield the same clusters that we started with. [sent-99, score-0.302]
49 Given a clustering C of some domain (X, d), we say that a distance function d over X, is (C, d)consistent if dX (x, y) ≤ dX (x, y) whenever x ∼C y, and dX (x, y) ≥ dX (x, y) whenever x ∼C y. [sent-102, score-0.863]
50 While this property may sound desirable and natural, it turns out that many common clustering paradigms fail to satisfy it. [sent-104, score-0.866]
51 Inner and Outer consistency: Outer consistency represents the preference for well separated clusters, by requiring that the output of a k-clustering function not change if clusters are moved away from each other. [sent-107, score-0.34]
52 Inner consistency represents the preference for placing points that are close together within the same cluster, by requiring that the output of a k-clustering function not change if elements of the same cluster are moved closer to each other. [sent-110, score-0.364]
53 This property is based on Kleinberg’s [8] richness axiom, requiring that for any sets X1 , X2 , . [sent-115, score-0.517]
54 A k-clustering function F satisfies kk richness if for any sets X1 , X2 , . [sent-122, score-0.463]
55 Given k sets, a k-clustering function satisfies outer richness if there exists some way of setting the between-set distances, without modifying distances within the sets, we can get F to output each of these data sets as a cluster. [sent-131, score-0.74]
56 A clustering function F is outer-rich if for every set of domains, {(X1 , d1 ), . [sent-133, score-0.658]
57 Threshold-richness: Fundamentally, the goal of clustering is to group points that are close to each other, and to separate points that are far apart. [sent-140, score-0.646]
58 Axioms of clustering need to represent these objectives and no set of axioms of clustering can be complete without integrating such requirements. [sent-141, score-1.418]
59 However, consistency has some counterintuitive implications (see Section 3 in [1]), and is not satisfied by many common clustering functions. [sent-146, score-0.733]
60 A k-clustering function F is threshold-rich if for every clustering C of X, there exist real numbers a < b so that for every distance function d over X where d(x, y) ≤ a for all x ∼C y, and d(x, y) ≥ b for all x ∼C y, we have that F (X, d, |C|) = C. [sent-147, score-0.775]
61 Inner richness: Complementary to outer richness, inner richness requires that there be a way of setting distances within sets, without modifying distances between the sets, so that F outputs each set as a cluster. [sent-149, score-0.787]
62 A k-clustering function F satisfies inner richness if for every data set (X, d) ˆ and partition {X1 , X2 , . [sent-151, score-0.579]
63 A clustering C of X is a refinement of clustering C of X if every cluster in C is a subset of some cluster in C , or, equivalently, if every cluster of C is a union of clusters of C. [sent-159, score-1.636]
64 The taxonomy in Figure 1 illustrates how clustering algorithms differ from one another. [sent-163, score-0.775]
65 Unlike all the other algorithms discussed, the spectral clustering functions are not local. [sent-166, score-0.684]
66 1 Axioms of clustering Our taxonomy reveals that some intuitive properties, which may be expected of all k-clustering functions, are not satisfied by some common k-clustering functions. [sent-170, score-0.783]
67 For example, locality is not satisfied by the spectral clustering functions ratio-cut and normalized-cut. [sent-171, score-0.763]
68 Also, most functions fail inner consistency, and therefore do not satisfy consistency, even though the latter was previously proposed as an axiom of k-clustering functions [8]. [sent-172, score-0.298]
69 On the other hand, isomorphism invariance, scale invariance, and all richness properties (in the setting where the number of clusters, k, is part of the input), are satisfied by all the clustering functions considered. [sent-173, score-1.219]
70 Threshold richness is the only one that is both satisfied by all k-clustering functions considered and reflects the main objective of clustering: to group points that are close together and to separate points that are far apart. [sent-175, score-0.544]
71 It is easy to see that threshold richness implies k-richness. [sent-176, score-0.457]
72 It can be shown that when threshold richness is combined with scale invariance, it also implies outer-richness and inner-richness. [sent-177, score-0.457]
73 Therefore, we propose that scale-invariance, isomorphism-invariance, and threshold richness can be used as clustering axioms. [sent-178, score-1.053]
74 Let P (F (X, d, k) = C) denote the probability of clustering C in the probability distribution F (X, d, k). [sent-182, score-0.596]
75 Such properties are translated into the probabilistic setting by requiring that when data sets (X, d) and (X , d ) satisfy some similarity requirements, then F (X, d, k) = F (X , d , k) for all k. [sent-187, score-0.263]
76 Every such property has some notion of a “(C, d)-nice” variant that specifies how the underlying distance function can be modified to better flesh out clustering C. [sent-189, score-0.716]
77 Richness properties: Richness properties require that any desired clustering can be obtained under certain constraints. [sent-191, score-0.684]
78 We say that a clustering of X specifies how to cluster a subset X ⊆ X if every cluster that overlaps with X is contained within X . [sent-198, score-0.828]
79 invariant X X scale invariant X X Axioms threshold rich local Clustering Algorithm Optimal k-means Random Centroids Lloyd Furthest Centroids Lloyd outer consistent Properties X Figure 2: An analysis of the k-means clustering function and k-means heuristics. [sent-201, score-0.976]
80 The two leftmost properties distinguish the k-means clustering function, properties that are satisfied by k-means but fail for other reasonable k-clustering functions. [sent-202, score-0.858]
81 The next three are proposed axioms of clustering, and the last two properties follow from the axioms. [sent-203, score-0.314]
82 In the probabilistic setting, we require that the probability of obtaining a specific clustering of X ⊆ X is determined by the probability of obtaining that clustering as a subset of F (X, d, k), given that the output of F on (X, d) specifies how to cluster X . [sent-204, score-1.361]
83 1 Properties Distinguishing K-means Heuristics k-means and k-means heuristics One of the most popular clustering algorithms is the Lloyd method, which aims to find clusterings with low k-means loss. [sent-218, score-0.788]
84 That is, find the clustering C of X so that x ∼C y if and only if argminc∈S c − x = argminc∈S c − y . [sent-226, score-0.596]
85 2 Distinguishing heuristics by properties An analysis of the k-means clustering functions and the two k-means heuristics discussed above is shown in Figure 2. [sent-238, score-0.874]
86 There are two properties that are satisfied by the k-means clustering function and fail for other reasonable k-clustering functions: outer-consistency and locality. [sent-241, score-0.756]
87 Note that unlike k-clustering functions that optimize common clustering objective functions, heuristics that aim to find clusterings with low loss for these objective functions do not necessarily make meaningful k-clustering functions. [sent-243, score-0.917]
88 Therefore, such heuristic’s failure to satisfy certain properties does not preclude these properties from being axioms of clustering, but rather illustrates a weakness of the heuristic. [sent-244, score-0.518]
89 It is interesting that the Lloyd method with the Furthest Centroids initialization technique satisfies our proposed axioms of clustering while Lloyd with Random Centroid fails threshold richness. [sent-245, score-0.891]
90 6 Impossibility Results In this final section, we strengthen Kleinberg’s famous impossibility result [8], for general clustering functions, yielding a simpler proof of the original result. [sent-249, score-0.852]
91 1, [8]) was that no general clustering function can simultaneously satisfy scale-invariance, richness, and consistency. [sent-251, score-0.696]
92 In Section 1, we showed that many natural clustering functions fail inner-consistency3 , which implies that there are many general clustering functions that fail consistency. [sent-253, score-1.386]
93 We strengthen Kleinberg’s impossibility result by relaxing consistency to outer-consistency. [sent-255, score-0.336]
94 No general clustering function can simultaneously satisfy outer-consistency, scaleinvariance, and richness. [sent-257, score-0.696]
95 Let F be any general clustering function that satisfies outer-consistency, scale-invariance and richness. [sent-259, score-0.615]
96 By richness, there exist distance functions d1 and d2 such that F (X, d1 ) = {X} (every domain point is a cluster on its own) and F (X, d2 ) is some different clustering, C = {C1 , . [sent-261, score-0.248]
97 No general clustering function can simultaneously satisfy inner-consistency, scaleinvariance, and richness. [sent-273, score-0.696]
98 Kleinberg’s impossibility result illustrates property trade-offs for general clustering functions. [sent-275, score-0.832]
99 The good news is that these results do not apply when the number of clusters is part of the input, as is illustrated in our taxonomy; single linkage satisfies scale-invariance, consistency and richness. [sent-276, score-0.254]
100 3 Note that a clustering function and it’s corresponding general clustering function satisfy the same set of consistency properties. [sent-277, score-1.4]
wordName wordTfidf (topN-words)
[('clustering', 0.596), ('richness', 0.421), ('axioms', 0.226), ('lloyd', 0.21), ('kleinberg', 0.201), ('impossibility', 0.162), ('taxonomy', 0.132), ('invariance', 0.13), ('outer', 0.111), ('consistency', 0.109), ('ackerman', 0.106), ('locality', 0.098), ('dx', 0.09), ('furthest', 0.088), ('properties', 0.088), ('ci', 0.085), ('clusters', 0.084), ('cluster', 0.083), ('paradigms', 0.082), ('clusterings', 0.074), ('heuristics', 0.073), ('invariant', 0.071), ('isomorphism', 0.07), ('domain', 0.066), ('strengthen', 0.065), ('inner', 0.061), ('linkage', 0.061), ('satisfy', 0.061), ('distances', 0.061), ('sibson', 0.06), ('centroids', 0.055), ('distance', 0.055), ('xk', 0.053), ('fail', 0.053), ('jardine', 0.053), ('isomorphic', 0.053), ('nement', 0.053), ('whenever', 0.052), ('ck', 0.051), ('satis', 0.05), ('centers', 0.047), ('property', 0.046), ('outputs', 0.045), ('rich', 0.044), ('users', 0.044), ('functions', 0.044), ('probabilistic', 0.043), ('every', 0.043), ('output', 0.043), ('argminc', 0.04), ('katsavounidis', 0.04), ('kuo', 0.04), ('loker', 0.04), ('scaleinvariance', 0.04), ('threshold', 0.036), ('exists', 0.035), ('partition', 0.035), ('preference', 0.035), ('ambitious', 0.035), ('axiom', 0.035), ('bosagh', 0.035), ('zadeh', 0.035), ('distinguishing', 0.035), ('distinguish', 0.033), ('initialization', 0.033), ('translate', 0.033), ('randomized', 0.03), ('objective', 0.029), ('famous', 0.029), ('consistent', 0.028), ('common', 0.028), ('illustrates', 0.028), ('bijection', 0.027), ('weakness', 0.027), ('requiring', 0.027), ('name', 0.027), ('modifying', 0.027), ('intuitive', 0.027), ('re', 0.027), ('aims', 0.026), ('quality', 0.025), ('points', 0.025), ('union', 0.025), ('spectral', 0.025), ('sets', 0.023), ('moved', 0.023), ('shai', 0.023), ('say', 0.023), ('validity', 0.023), ('behavior', 0.022), ('characterization', 0.021), ('similarity', 0.021), ('formalize', 0.02), ('distinction', 0.02), ('reformulated', 0.02), ('simultaneously', 0.02), ('wide', 0.02), ('es', 0.02), ('algorithms', 0.019), ('function', 0.019)]
simIndex simValue paperId paperTitle
same-paper 1 0.99999964 273 nips-2010-Towards Property-Based Classification of Clustering Paradigms
Author: Margareta Ackerman, Shai Ben-David, David Loker
Abstract: Clustering is a basic data mining task with a wide variety of applications. Not surprisingly, there exist many clustering algorithms. However, clustering is an ill defined problem - given a data set, it is not clear what a “correct” clustering for that set is. Indeed, different algorithms may yield dramatically different outputs for the same input sets. Faced with a concrete clustering task, a user needs to choose an appropriate clustering algorithm. Currently, such decisions are often made in a very ad hoc, if not completely random, manner. Given the crucial effect of the choice of a clustering algorithm on the resulting clustering, this state of affairs is truly regrettable. In this paper we address the major research challenge of developing tools for helping users make more informed decisions when they come to pick a clustering tool for their data. This is, of course, a very ambitious endeavor, and in this paper, we make some first steps towards this goal. We propose to address this problem by distilling abstract properties of the input-output behavior of different clustering paradigms. In this paper, we demonstrate how abstract, intuitive properties of clustering functions can be used to taxonomize a set of popular clustering algorithmic paradigms. On top of addressing deterministic clustering algorithms, we also propose similar properties for randomized algorithms and use them to highlight functional differences between different common implementations of k-means clustering. We also study relationships between the properties, independent of any particular algorithm. In particular, we strengthen Kleinberg’s famous impossibility result, while providing a simpler proof. 1
2 0.2600269 230 nips-2010-Robust Clustering as Ensembles of Affinity Relations
Author: Hairong Liu, Longin J. Latecki, Shuicheng Yan
Abstract: In this paper, we regard clustering as ensembles of k-ary affinity relations and clusters correspond to subsets of objects with maximal average affinity relations. The average affinity relation of a cluster is relaxed and well approximated by a constrained homogenous function. We present an efficient procedure to solve this optimization problem, and show that the underlying clusters can be robustly revealed by using priors systematically constructed from the data. Our method can automatically select some points to form clusters, leaving other points un-grouped; thus it is inherently robust to large numbers of outliers, which has seriously limited the applicability of classical methods. Our method also provides a unified solution to clustering from k-ary affinity relations with k ≥ 2, that is, it applies to both graph-based and hypergraph-based clustering problems. Both theoretical analysis and experimental results show the superiority of our method over classical solutions to the clustering problem, especially when there exists a large number of outliers.
3 0.23737752 261 nips-2010-Supervised Clustering
Author: Pranjal Awasthi, Reza B. Zadeh
Abstract: Despite the ubiquity of clustering as a tool in unsupervised learning, there is not yet a consensus on a formal theory, and the vast majority of work in this direction has focused on unsupervised clustering. We study a recently proposed framework for supervised clustering where there is access to a teacher. We give an improved generic algorithm to cluster any concept class in that model. Our algorithm is query-efficient in the sense that it involves only a small amount of interaction with the teacher. We also present and study two natural generalizations of the model. The model assumes that the teacher response to the algorithm is perfect. We eliminate this limitation by proposing a noisy model and give an algorithm for clustering the class of intervals in this noisy model. We also propose a dynamic model where the teacher sees a random subset of the points. Finally, for datasets satisfying a spectrum of weak to strong properties, we give query bounds, and show that a class of clustering functions containing Single-Linkage will find the target clustering under the strongest property.
4 0.15458065 166 nips-2010-Minimum Average Cost Clustering
Author: Kiyohito Nagano, Yoshinobu Kawahara, Satoru Iwata
Abstract: A number of objective functions in clustering problems can be described with submodular functions. In this paper, we introduce the minimum average cost criterion, and show that the theory of intersecting submodular functions can be used for clustering with submodular objective functions. The proposed algorithm does not require the number of clusters in advance, and it will be determined by the property of a given set of data points. The minimum average cost clustering problem is parameterized with a real variable, and surprisingly, we show that all information about optimal clusterings for all parameters can be computed in polynomial time in total. Additionally, we evaluate the performance of the proposed algorithm through computational experiments. 1
5 0.13695034 223 nips-2010-Rates of convergence for the cluster tree
Author: Kamalika Chaudhuri, Sanjoy Dasgupta
Abstract: For a density f on Rd , a high-density cluster is any connected component of {x : f (x) ≥ λ}, for some λ > 0. The set of all high-density clusters form a hierarchy called the cluster tree of f . We present a procedure for estimating the cluster tree given samples from f . We give finite-sample convergence rates for our algorithm, as well as lower bounds on the sample complexity of this estimation problem. 1
6 0.13277674 221 nips-2010-Random Projections for $k$-means Clustering
7 0.11895198 70 nips-2010-Efficient Optimization for Discriminative Latent Class Models
8 0.11872406 210 nips-2010-Practical Large-Scale Optimization for Max-norm Regularization
10 0.089757331 62 nips-2010-Discriminative Clustering by Regularized Information Maximization
11 0.066160716 67 nips-2010-Dynamic Infinite Relational Model for Time-varying Relational Data Analysis
12 0.06136322 116 nips-2010-Identifying Patients at Risk of Major Adverse Cardiovascular Events Using Symbolic Mismatch
13 0.058332555 155 nips-2010-Learning the context of a category
14 0.054657601 232 nips-2010-Sample Complexity of Testing the Manifold Hypothesis
15 0.052574039 55 nips-2010-Cross Species Expression Analysis using a Dirichlet Process Mixture Model with Latent Matchings
16 0.048778508 251 nips-2010-Sphere Embedding: An Application to Part-of-Speech Induction
17 0.046579096 276 nips-2010-Tree-Structured Stick Breaking for Hierarchical Data
18 0.046389181 24 nips-2010-Active Learning Applied to Patient-Adaptive Heartbeat Classification
19 0.046078727 270 nips-2010-Tight Sample Complexity of Large-Margin Learning
20 0.044349492 29 nips-2010-An Approximate Inference Approach to Temporal Optimization in Optimal Control
topicId topicWeight
[(0, 0.143), (1, 0.043), (2, 0.087), (3, 0.025), (4, -0.031), (5, -0.062), (6, 0.001), (7, -0.041), (8, 0.203), (9, -0.005), (10, 0.13), (11, -0.083), (12, 0.106), (13, -0.222), (14, 0.319), (15, -0.118), (16, 0.116), (17, 0.132), (18, -0.01), (19, -0.065), (20, 0.153), (21, 0.01), (22, -0.002), (23, 0.093), (24, -0.118), (25, -0.02), (26, 0.014), (27, -0.025), (28, 0.065), (29, 0.001), (30, 0.05), (31, -0.024), (32, 0.014), (33, 0.025), (34, -0.068), (35, 0.037), (36, 0.018), (37, -0.017), (38, 0.024), (39, -0.097), (40, 0.046), (41, 0.029), (42, -0.04), (43, -0.037), (44, 0.016), (45, -0.05), (46, -0.13), (47, 0.056), (48, 0.058), (49, 0.064)]
simIndex simValue paperId paperTitle
same-paper 1 0.98824161 273 nips-2010-Towards Property-Based Classification of Clustering Paradigms
Author: Margareta Ackerman, Shai Ben-David, David Loker
Abstract: Clustering is a basic data mining task with a wide variety of applications. Not surprisingly, there exist many clustering algorithms. However, clustering is an ill defined problem - given a data set, it is not clear what a “correct” clustering for that set is. Indeed, different algorithms may yield dramatically different outputs for the same input sets. Faced with a concrete clustering task, a user needs to choose an appropriate clustering algorithm. Currently, such decisions are often made in a very ad hoc, if not completely random, manner. Given the crucial effect of the choice of a clustering algorithm on the resulting clustering, this state of affairs is truly regrettable. In this paper we address the major research challenge of developing tools for helping users make more informed decisions when they come to pick a clustering tool for their data. This is, of course, a very ambitious endeavor, and in this paper, we make some first steps towards this goal. We propose to address this problem by distilling abstract properties of the input-output behavior of different clustering paradigms. In this paper, we demonstrate how abstract, intuitive properties of clustering functions can be used to taxonomize a set of popular clustering algorithmic paradigms. On top of addressing deterministic clustering algorithms, we also propose similar properties for randomized algorithms and use them to highlight functional differences between different common implementations of k-means clustering. We also study relationships between the properties, independent of any particular algorithm. In particular, we strengthen Kleinberg’s famous impossibility result, while providing a simpler proof. 1
2 0.89942521 230 nips-2010-Robust Clustering as Ensembles of Affinity Relations
Author: Hairong Liu, Longin J. Latecki, Shuicheng Yan
Abstract: In this paper, we regard clustering as ensembles of k-ary affinity relations and clusters correspond to subsets of objects with maximal average affinity relations. The average affinity relation of a cluster is relaxed and well approximated by a constrained homogenous function. We present an efficient procedure to solve this optimization problem, and show that the underlying clusters can be robustly revealed by using priors systematically constructed from the data. Our method can automatically select some points to form clusters, leaving other points un-grouped; thus it is inherently robust to large numbers of outliers, which has seriously limited the applicability of classical methods. Our method also provides a unified solution to clustering from k-ary affinity relations with k ≥ 2, that is, it applies to both graph-based and hypergraph-based clustering problems. Both theoretical analysis and experimental results show the superiority of our method over classical solutions to the clustering problem, especially when there exists a large number of outliers.
3 0.8567307 261 nips-2010-Supervised Clustering
Author: Pranjal Awasthi, Reza B. Zadeh
Abstract: Despite the ubiquity of clustering as a tool in unsupervised learning, there is not yet a consensus on a formal theory, and the vast majority of work in this direction has focused on unsupervised clustering. We study a recently proposed framework for supervised clustering where there is access to a teacher. We give an improved generic algorithm to cluster any concept class in that model. Our algorithm is query-efficient in the sense that it involves only a small amount of interaction with the teacher. We also present and study two natural generalizations of the model. The model assumes that the teacher response to the algorithm is perfect. We eliminate this limitation by proposing a noisy model and give an algorithm for clustering the class of intervals in this noisy model. We also propose a dynamic model where the teacher sees a random subset of the points. Finally, for datasets satisfying a spectrum of weak to strong properties, we give query bounds, and show that a class of clustering functions containing Single-Linkage will find the target clustering under the strongest property.
Author: Matthias Hein, Thomas Bühler
Abstract: Many problems in machine learning and statistics can be formulated as (generalized) eigenproblems. In terms of the associated optimization problem, computing linear eigenvectors amounts to finding critical points of a quadratic function subject to quadratic constraints. In this paper we show that a certain class of constrained optimization problems with nonquadratic objective and constraints can be understood as nonlinear eigenproblems. We derive a generalization of the inverse power method which is guaranteed to converge to a nonlinear eigenvector. We apply the inverse power method to 1-spectral clustering and sparse PCA which can naturally be formulated as nonlinear eigenproblems. In both applications we achieve state-of-the-art results in terms of solution quality and runtime. Moving beyond the standard eigenproblem should be useful also in many other applications and our inverse power method can be easily adapted to new problems. 1
5 0.60485232 223 nips-2010-Rates of convergence for the cluster tree
Author: Kamalika Chaudhuri, Sanjoy Dasgupta
Abstract: For a density f on Rd , a high-density cluster is any connected component of {x : f (x) ≥ λ}, for some λ > 0. The set of all high-density clusters form a hierarchy called the cluster tree of f . We present a procedure for estimating the cluster tree given samples from f . We give finite-sample convergence rates for our algorithm, as well as lower bounds on the sample complexity of this estimation problem. 1
6 0.60379058 221 nips-2010-Random Projections for $k$-means Clustering
7 0.56744218 166 nips-2010-Minimum Average Cost Clustering
8 0.54946321 62 nips-2010-Discriminative Clustering by Regularized Information Maximization
9 0.52669144 67 nips-2010-Dynamic Infinite Relational Model for Time-varying Relational Data Analysis
10 0.5031091 210 nips-2010-Practical Large-Scale Optimization for Max-norm Regularization
11 0.4130618 70 nips-2010-Efficient Optimization for Discriminative Latent Class Models
12 0.39858666 155 nips-2010-Learning the context of a category
13 0.33985606 116 nips-2010-Identifying Patients at Risk of Major Adverse Cardiovascular Events Using Symbolic Mismatch
14 0.33620617 251 nips-2010-Sphere Embedding: An Application to Part-of-Speech Induction
15 0.32102785 55 nips-2010-Cross Species Expression Analysis using a Dirichlet Process Mixture Model with Latent Matchings
16 0.29034108 24 nips-2010-Active Learning Applied to Patient-Adaptive Heartbeat Classification
17 0.28947756 105 nips-2010-Getting lost in space: Large sample analysis of the resistance distance
18 0.26822281 289 nips-2010-b-Bit Minwise Hashing for Estimating Three-Way Similarities
19 0.26085901 90 nips-2010-Fast Large-scale Mixture Modeling with Component-specific Data Partitions
20 0.24950776 71 nips-2010-Efficient Relational Learning with Hidden Variable Detection
topicId topicWeight
[(13, 0.057), (27, 0.026), (30, 0.071), (35, 0.018), (45, 0.197), (50, 0.028), (52, 0.029), (60, 0.076), (77, 0.065), (78, 0.015), (90, 0.042), (95, 0.289)]
simIndex simValue paperId paperTitle
1 0.75276697 256 nips-2010-Structural epitome: a way to summarize one’s visual experience
Author: Nebojsa Jojic, Alessandro Perina, Vittorio Murino
Abstract: In order to study the properties of total visual input in humans, a single subject wore a camera for two weeks capturing, on average, an image every 20 seconds. The resulting new dataset contains a mix of indoor and outdoor scenes as well as numerous foreground objects. Our first goal is to create a visual summary of the subject’s two weeks of life using unsupervised algorithms that would automatically discover recurrent scenes, familiar faces or common actions. Direct application of existing algorithms, such as panoramic stitching (e.g., Photosynth) or appearance-based clustering models (e.g., the epitome), is impractical due to either the large dataset size or the dramatic variations in the lighting conditions. As a remedy to these problems, we introduce a novel image representation, the ”structural element (stel) epitome,” and an associated efficient learning algorithm. In our model, each image or image patch is characterized by a hidden mapping T which, as in previous epitome models, defines a mapping between the image coordinates and the coordinates in the large ”all-I-have-seen” epitome matrix. The limited epitome real-estate forces the mappings of different images to overlap which indicates image similarity. However, the image similarity no longer depends on direct pixel-to-pixel intensity/color/feature comparisons as in previous epitome models, but on spatial configuration of scene or object parts, as the model is based on the palette-invariant stel models. As a result, stel epitomes capture structure that is invariant to non-structural changes, such as illumination changes, that tend to uniformly affect pixels belonging to a single scene or object part. 1
same-paper 2 0.74960285 273 nips-2010-Towards Property-Based Classification of Clustering Paradigms
Author: Margareta Ackerman, Shai Ben-David, David Loker
Abstract: Clustering is a basic data mining task with a wide variety of applications. Not surprisingly, there exist many clustering algorithms. However, clustering is an ill defined problem - given a data set, it is not clear what a “correct” clustering for that set is. Indeed, different algorithms may yield dramatically different outputs for the same input sets. Faced with a concrete clustering task, a user needs to choose an appropriate clustering algorithm. Currently, such decisions are often made in a very ad hoc, if not completely random, manner. Given the crucial effect of the choice of a clustering algorithm on the resulting clustering, this state of affairs is truly regrettable. In this paper we address the major research challenge of developing tools for helping users make more informed decisions when they come to pick a clustering tool for their data. This is, of course, a very ambitious endeavor, and in this paper, we make some first steps towards this goal. We propose to address this problem by distilling abstract properties of the input-output behavior of different clustering paradigms. In this paper, we demonstrate how abstract, intuitive properties of clustering functions can be used to taxonomize a set of popular clustering algorithmic paradigms. On top of addressing deterministic clustering algorithms, we also propose similar properties for randomized algorithms and use them to highlight functional differences between different common implementations of k-means clustering. We also study relationships between the properties, independent of any particular algorithm. In particular, we strengthen Kleinberg’s famous impossibility result, while providing a simpler proof. 1
3 0.7457093 168 nips-2010-Monte-Carlo Planning in Large POMDPs
Author: David Silver, Joel Veness
Abstract: This paper introduces a Monte-Carlo algorithm for online planning in large POMDPs. The algorithm combines a Monte-Carlo update of the agent’s belief state with a Monte-Carlo tree search from the current belief state. The new algorithm, POMCP, has two important properties. First, MonteCarlo sampling is used to break the curse of dimensionality both during belief state updates and during planning. Second, only a black box simulator of the POMDP is required, rather than explicit probability distributions. These properties enable POMCP to plan effectively in significantly larger POMDPs than has previously been possible. We demonstrate its effectiveness in three large POMDPs. We scale up a well-known benchmark problem, rocksample, by several orders of magnitude. We also introduce two challenging new POMDPs: 10 × 10 battleship and partially observable PacMan, with approximately 1018 and 1056 states respectively. Our MonteCarlo planning algorithm achieved a high level of performance with no prior knowledge, and was also able to exploit simple domain knowledge to achieve better results with less search. POMCP is the first general purpose planner to achieve high performance in such large and unfactored POMDPs. 1
4 0.62834358 92 nips-2010-Fast global convergence rates of gradient methods for high-dimensional statistical recovery
Author: Alekh Agarwal, Sahand Negahban, Martin J. Wainwright
Abstract: Many statistical M -estimators are based on convex optimization problems formed by the weighted sum of a loss function with a norm-based regularizer. We analyze the convergence rates of first-order gradient methods for solving such problems within a high-dimensional framework that allows the data dimension d to grow with (and possibly exceed) the sample size n. This high-dimensional structure precludes the usual global assumptions— namely, strong convexity and smoothness conditions—that underlie classical optimization analysis. We define appropriately restricted versions of these conditions, and show that they are satisfied with high probability for various statistical models. Under these conditions, our theory guarantees that Nesterov’s first-order method [12] has a globally geometric rate of convergence up to the statistical precision of the model, meaning the typical Euclidean distance between the true unknown parameter θ∗ and the optimal solution θ. This globally linear rate is substantially faster than previous analyses of global convergence for specific methods that yielded only sublinear rates. Our analysis applies to a wide range of M -estimators and statistical models, including sparse linear regression using Lasso ( 1 regularized regression), group Lasso, block sparsity, and low-rank matrix recovery using nuclear norm regularization. Overall, this result reveals an interesting connection between statistical precision and computational efficiency in high-dimensional estimation. 1
5 0.62780041 193 nips-2010-Online Learning: Random Averages, Combinatorial Parameters, and Learnability
Author: Alexander Rakhlin, Karthik Sridharan, Ambuj Tewari
Abstract: We develop a theory of online learning by defining several complexity measures. Among them are analogues of Rademacher complexity, covering numbers and fatshattering dimension from statistical learning theory. Relationship among these complexity measures, their connection to online learning, and tools for bounding them are provided. We apply these results to various learning problems. We provide a complete characterization of online learnability in the supervised setting. 1
6 0.6276449 265 nips-2010-The LASSO risk: asymptotic results and real world examples
7 0.62690216 63 nips-2010-Distributed Dual Averaging In Networks
8 0.62676549 243 nips-2010-Smoothness, Low Noise and Fast Rates
9 0.62659824 196 nips-2010-Online Markov Decision Processes under Bandit Feedback
10 0.62607318 31 nips-2010-An analysis on negative curvature induced by singularity in multi-layer neural-network learning
11 0.62443423 222 nips-2010-Random Walk Approach to Regret Minimization
12 0.62389696 87 nips-2010-Extended Bayesian Information Criteria for Gaussian Graphical Models
13 0.62291116 148 nips-2010-Learning Networks of Stochastic Differential Equations
14 0.62241668 163 nips-2010-Lower Bounds on Rate of Convergence of Cutting Plane Methods
15 0.62233067 117 nips-2010-Identifying graph-structured activation patterns in networks
16 0.62210006 134 nips-2010-LSTD with Random Projections
17 0.62209594 27 nips-2010-Agnostic Active Learning Without Constraints
18 0.62077278 70 nips-2010-Efficient Optimization for Discriminative Latent Class Models
19 0.62046975 7 nips-2010-A Family of Penalty Functions for Structured Sparsity
20 0.6190185 219 nips-2010-Random Conic Pursuit for Semidefinite Programming