nips nips2009 nips2009-234 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Nir Ailon, Ragesh Jaiswal, Claire Monteleoni
Abstract: We provide a clustering algorithm that approximately optimizes the k-means objective, in the one-pass streaming setting. We make no assumptions about the data, and our algorithm is very light-weight in terms of memory, and computation. This setting is applicable to unsupervised learning on massive data sets, or resource-constrained devices. The two main ingredients of our theoretical work are: a derivation of an extremely simple pseudo-approximation batch algorithm for k-means (based on the recent k-means++), in which the algorithm is allowed to output more than k centers, and a streaming clustering algorithm in which batch clustering algorithms are performed on small inputs (fitting in memory) and combined in a hierarchical manner. Empirical evaluations on real and simulated data reveal the practical utility of our method. 1
Reference: text
sentIndex sentText sentNum sentScore
1 edu Abstract We provide a clustering algorithm that approximately optimizes the k-means objective, in the one-pass streaming setting. [sent-5, score-0.674]
2 1 Introduction As commercial, social, and scientific data sources continue to grow at an unprecedented rate, it is increasingly important that algorithms to process and analyze this data operate in online, or one-pass streaming settings. [sent-10, score-0.474]
3 The problem with many heuristics designed to implement some notion of clustering is that their outputs can be hard to evaluate. [sent-13, score-0.244]
4 The k-means objective is a simple, intuitive, and widely-cited clustering objective for data in Euclidean space. [sent-15, score-0.275]
5 However, although many clustering algorithms have been designed with the k-means objective in mind, very few have approximation guarantees with respect to this objective. [sent-16, score-0.377]
6 In this work, we give a one-pass streaming algorithm for the k-means problem. [sent-17, score-0.461]
7 We are not aware of previous approximation guarantees with respect to the k-means objective that have been shown for simple clustering algorithms that operate in either online or streaming settings. [sent-18, score-0.835]
8 We extend work of Arthur and Vassilvitskii [AV07] to provide a bi-criterion approximation algorithm for k-means, in the batch setting. [sent-19, score-0.295]
9 They define a seeding procedure which chooses a subset of k points from a batch of points, and they show that this subset gives an expected O(log (k))-approximation to the kmeans objective. [sent-20, score-0.474]
10 This seeding procedure is followed by Lloyd’s algorithm1 which works very well in practice with the seeding. [sent-21, score-0.178]
11 2 We modify k-means++ to obtain a new algorithm, kmeans#, which chooses a subset of O(k log (k)) points, and we show that the chosen subset of ∗ Department of Computer Science. [sent-23, score-0.176]
12 Center for Computational Learning Systems 1 Lloyd’s algorithm is popularly known as the k-means algorithm 2 Since the approximation guarantee is proven based on the seeding procedure alone, for the purposes of this exposition we denote the seeding procedure as k-means++. [sent-25, score-0.628]
13 Apart from giving us a bi-criterion approximation algorithm, our modified seeding procedure is very simple to analyze. [sent-27, score-0.263]
14 [GMMM+03] defines a divide-and-conquer strategy to combine multiple bi-criterion approximation algorithms for the k-medoid problem to yield a one-pass streaming approximation algorithm for k-median. [sent-28, score-0.717]
15 1 Related work There is much literature on both clustering algorithms [Gon85, Ind99, VW02, GMMM+03, KMNP+04, ORSS06, AV07, CR08, BBG09, AL09], and streaming algorithms [Ind99, GMMM+03, M05, McG07]. [sent-32, score-0.707]
16 3 There has also been work on combining these settings: designing clustering algorithms that operate in the streaming setting [Ind99, GMMM+03, CCP03]. [sent-33, score-0.687]
17 k-means++, the seeding procedure in [AV07], had previously been analyzed by [ORSS06], under special assumptions on the input data. [sent-36, score-0.178]
18 As future work, we propose analyzing variants of these algorithms in our streaming clustering algorithm, with the goal of yielding a streaming clustering algorithm with a constant approximation to the k-means objective. [sent-43, score-1.418]
19 Finally, it is important to make a distinction from some lines of clustering research which involve assumptions on the data to be clustered. [sent-44, score-0.213]
20 [BL08], and data that admits a clustering with well separated means e. [sent-50, score-0.213]
21 Recent work [BBG09] assumes a “target” clustering for the specific application and data set, that is close to any constant approximation of the clustering objective. [sent-53, score-0.511]
22 The subset C is alternatively called a clustering of X and φC is called the potential function corresponding to the clustering. [sent-61, score-0.273]
23 3 For a comprehensive survey of streaming results and literature, refer to [M05]. [sent-63, score-0.398]
24 In recent, independent work, Aggarwal, Deshpande, and Kannan [ADK09] extend the seeding procedure of k-means++ to obtain a constant factor approximation algorithm which outputs O(k) centers. [sent-64, score-0.357]
25 They use similar techniques to ours, but reduce the number of centers by using a stronger concentration property. [sent-65, score-0.264]
26 Given an algorithm B for the k-means problems, let φC be the potential of the clustering C returned by B (on some input set which is implicit) and let φCOP T denote the potential of the optimal clustering COP T . [sent-70, score-0.543]
27 The previous definition might be too strong for an approximation algorithm for some purposes. [sent-73, score-0.148]
28 For example, the clustering algorithm performs poorly when it is constrained to output k centers but it might become competitive when it is allowed to output more centers. [sent-74, score-0.577]
29 We call an algorithm B, (a, b)-approximation for the kmeans problem if it outputs a clustering C with ak centers with potential φC such that φCφC ≤ b in OP T the worst case. [sent-77, score-0.676]
30 2 k-means#: The advantages of careful and liberal seeding The k-means++ algorithm is an expected Θ(log k)-approximation algorithm. [sent-80, score-0.241]
31 to the subset of points chosen in the previous rounds) Algorithm 1: k-means++ In the original definition of k-means++ in [AV07], the above algorithm is followed by Lloyd’s algorithm. [sent-91, score-0.16]
32 The above algorithm is used as a seeding step for Lloyd’s algorithm which is known to give the best results in practice. [sent-92, score-0.304]
33 On the other hand, the theoretical guarantee of the k-means++ comes from analyzing this seeding step and not Lloyd’s algorithm. [sent-93, score-0.209]
34 So, for our analysis we focus on this seeding step. [sent-94, score-0.178]
35 In the above algorithm X denotes the set of given points and for any point x, D(x) denotes the distance of this point from the nearest center among the centers chosen in the previous rounds. [sent-96, score-0.503]
36 Given a clustering C, its potential with respect to some set A is denoted by φC (A) and is defined as φC (A) = x∈A D(x)2 , where D(x) is the distance of the point x from the nearest point in C. [sent-105, score-0.24]
37 Let A be an arbitrary cluster in COP T , and let C be the clustering with just one center, chosen uniformly at random from A. [sent-109, score-0.311]
38 Let A be an arbitrary cluster in COP T , and let C be the clustering with just one center, which is chosen uniformly at random from A. [sent-113, score-0.311]
39 Choose 3 · log k centers independently and uniformly at random from X . [sent-127, score-0.342]
40 Choose 3 · log k centers independently and with probability P D(xD(x)2 . [sent-131, score-0.342]
41 to the subset of points chosen in the previous rounds) Algorithm 2: k-means# Note that the algorithm is almost the same as the k-means++ algorithm except that in each round of choosing centers, we pick O(log k) centers rather than a single center. [sent-136, score-0.569]
42 The running time of the above algorithm is clearly O(ndk log k). [sent-137, score-0.174]
43 , Ak } denote the set of clusters in the optimal clustering COP T . [sent-141, score-0.261]
44 Let C i denote the clustering after ith round of choosing centers. [sent-142, score-0.295]
45 The following simple lemma shows that with constant probability step (1) of k-means# picks a center such that at least one of the clusters gets covered, or in other words, |A1 | ≥ 1. [sent-146, score-0.163]
46 5, we show that the probability of covering an uncovered cluster in the (i + 1)th round is large. [sent-157, score-0.222]
47 In the latter case, we will show that the current set of centers is already competitive with constant approximation ratio. [sent-158, score-0.386]
48 Note that in the (i + 1)th round, the probability that a center is chosen from a cluster ∈ Ai is / c φ (X i ) i at least φ i (X C)+φu i (X i ) ≥ 1/2. [sent-169, score-0.158]
49 Conditioned on this event, with probability at least 3/4 any of the i u c C C centers x chosen in round (i + 1) satisfies φC i ∪x (A) ≤ 32 · φCOP T (A) for some uncovered cluster A ∈ Ai . [sent-170, score-0.518]
50 This means that with probability at least 3/8 any of the chosen centers x in round (i + 1) u satisfies φC i ∪x (A) ≤ 32 · φCOP T (A) for some uncovered cluster A ∈ Ai . [sent-171, score-0.518]
51 This further implies that u with probability at least (1 − 1/k) at least one of the chosen centers x in round (i + 1) satisfies φC i ∪x (A) ≤ 32 · φCOP T (A) for some uncovered cluster A ∈ Ai . [sent-172, score-0.518]
52 We have shown that k-means# is a randomized algorithm for clustering which with probability at least 1/4 gives a clustering with competitive ratio 64. [sent-192, score-0.557]
53 4 3 A single pass streaming algorithm for k-means In this section, we will provide a single pass streaming algorithm. [sent-193, score-1.007]
54 The basic ingredients for the algorithm is a divide and conquer strategy defined by [GMMM+03] which uses bi-criterion approximation algorithms in the batch setting. [sent-194, score-0.657]
55 We will use k-means++ which is a (1, O(log k))-approximation algorithm and k-means# which is a (O(log k), O(1))-approximation algorithm, to construct a single pass streaming O(log k)-approximation algorithm for k-means problem. [sent-195, score-0.598]
56 1 A streaming (a,b)-approximation for k-means We will show that a simple streaming divide-and-conquer scheme, analyzed by [GMMM+03] with respect to the k-medoid objective, can be used to approximate the k-means objective. [sent-198, score-0.796]
57 , } Run A on Si to get ≤ ak centers Ti = {ti1 , ti2 , . [sent-218, score-0.325]
58 The algorithm above outputs a clustering that is an (a , 2b + 4b (b + 1))approximation to the k-means objective. [sent-225, score-0.307]
59 The a approximation of the desired number of centers follows directly from the approximation property of A , with respect to the number of centers, since A is the last algorithm to be run. [sent-226, score-0.497]
60 Cluster centers are chosen from Rd , for the k-means problem, so in various parts of the proof we save an approximation a factor of 2 from the k-medoid problem, in which cluster centers must be chosen from the input data. [sent-234, score-0.743]
61 2 Using k-means++ and k-means# in the divide-and-conquer strategy In the previous subsection, we saw how a (a, b)-approximation algorithm A and an (a , b )approximation algorithm A can be used to get a single pass (a , 2b + 4b (b + 1))-approximation streaming algorithm. [sent-236, score-0.67]
62 We now have two randomized algorithms, k-means# which with probability at least 1/4 is a (3 log k, 64)-approximation algorithm and k-means++ which is a (1, O(log k))approximation algorithm (the approximation factor being in expectation). [sent-237, score-0.32]
63 We can now use these two algorithms in the divide-and-conquer strategy to obtain a single pass streaming algorithm. [sent-238, score-0.558]
64 We use the following as algorithms as A and A in the divide-and-conquer strategy (3): 5 A: “Run k-means# on the data 3 log n times independently, and pick the clustering with the smallest cost. [sent-239, score-0.377]
65 With probability at least 1 − (3/4)3 log n ≥ 1 − n , algorithm A is a (3 log k, 64)approximation algorithm. [sent-251, score-0.219]
66 Since each batch is of size nk the number of batches is n/k, the probability that A is a (3 log k, 64)-approximation algorithm for all of these √ n/k 1 batches is at least 1 − n ≥ 1/2. [sent-254, score-0.46]
67 Moreover, the algorithm has running time O(dnk log n log k). [sent-257, score-0.252]
68 This immediately implies a tradeoff between the memory requirements (growing like a), the number of centers outputted (which is a k) and the approximation to the potential (which is cbb ) with respect to the optimal solution using k centers. [sent-261, score-0.613]
69 Assume we have subroutines for performing (ai , bi )-approximation for k-means in batch mode, for i = 1, . [sent-266, score-0.181]
70 In the topmost level, we will divide the input into equal blocks of size M1 , and run our (a1 , b1 )-approximation algorithm on each block. [sent-280, score-0.194]
71 Buffer B1 will be repeatedly reused for this task, and after each application of the approximation algorithm, the outputted set of (at most) ka1 centers will be added to B2 . [sent-281, score-0.401]
72 When B2 is filled, we will run the (a2 , b2 )-approximation algorithm on the data and add the ka2 outputted centers to B3 . [sent-282, score-0.412]
73 This will continue until buffer Br fills, and the (ar , br )-approximation algorithm outputs the final ar k centers. [sent-283, score-0.385]
74 In order to minimize the total memory 1 ···ar−1 Mi under the last constraint, using standard arguments in multivariate analysis we must have 1/r M1 = · · · = Mr , or in other words Mi = nk r−1 a1 · · · ar−1 ≤ n1/r k(a1 · · · ar−1 )1/r for all i. [sent-291, score-0.17]
75 The resulting one-pass algorithm will have an approximation guarantee of (ar , cr−1 b1 · · · br ) (using a straightforward extension of the result in the previous section) and memory requirement of at most rn1/r k(a1 · · · ar−1 )1/r . [sent-292, score-0.384]
76 , (ar , br ) = (1, O(log k)) (we are interested in outputting exactly k centers as the final solution). [sent-301, score-0.353]
77 By the above discussion, the memory is used optimally if M = rn1/r k(3 log k)q/r , in which case the final approximation guarantee will be cr−1 (log k)r−q , for some global c > 0. [sent-306, score-0.31]
78 This implies that the final approximation guarantee is best if q = r − 1, in other words, we choose the repeated k-means# for levels 1. [sent-309, score-0.194]
79 The resulting algorithm is a randomized one-pass streaming approximation to k-means, with an approximation ratio of O(˜r−1 (log k)), for some global c > 0. [sent-315, score-0.662]
80 The running c ˜ time of the algorithm is O(dnk 2 log n log k). [sent-316, score-0.252]
81 We should compare the above multi-level streaming algorithm with the state-of-art (in terms of memory vs. [sent-317, score-0.577]
82 Charikar, Callaghan, and Panigrahy [CCP03] give a one-pass streaming algorithm for the k-median problem which gives a constant factor approximation and uses O(k·poly log(n)) memory. [sent-319, score-0.546]
83 data items can be temporarily stored in a memory buffer and quickly processed before the the next memory buffer is filled). [sent-325, score-0.404]
84 So, even if [CCP03] can be extended to the k-means setting, streaming algorithms based on the divide-and-conquer-strategy would be more interesting from a practical point of view. [sent-326, score-0.446]
85 Standard Lloyd’s algorithm operates in the batch setting, which is an easier problem than the one-pass streaming setting, so we ran experiments with this algorithm to form a baseline. [sent-339, score-0.671]
86 We also compare to an online version of Lloyd’s algorithm, however the performance is worse than the batch version, and our methods, for all problems, so we 8 Testing clustering algorithms on this simulation distribution was inspired by [AV07]. [sent-340, score-0.44]
87 10 Table 1: Columns 2-5 have the clustering cost and columns 6-9 have time in sec. [sent-461, score-0.272]
88 The memory size decreases as the number of levels of the hierarchy increases. [sent-477, score-0.168]
89 (0 levels means running batch k-means++ on the data. [sent-478, score-0.232]
90 9 Tables 1a)-c) shows average k-means cost (over 10 random restarts for the randomized algorithms: all but Online Lloyd’s) for these algorithms: (1) BL: Batch Lloyd’s, initialized with random centers in the input data, and run to convergence. [sent-480, score-0.387]
91 (3) DC-1: The simple 1-stage divide and conquer algorithm of Section 3. [sent-482, score-0.339]
92 (4) DC-2: The simple 1-stage divide and conquer algorithm 3 of Section 3. [sent-484, score-0.339]
93 In our context, k-means++ and k-means# are only the seeding step, not followed by Lloyd’s algorithm. [sent-487, score-0.178]
94 In all problems, our streaming methods achieve much lower cost than Online Lloyd’s, for all settings of k, and lower cost than Batch Lloyd’s for most settings of k (including the correct k = 25, in norm25). [sent-488, score-0.516]
95 The gains with respect to batch are noteworthy, since the batch problem is less constrained than the one-pass streaming problem. [sent-489, score-0.692]
96 Although our worst-case theoretical results imply an exponential clustering cost as a function of the number of levels, our results show a far more optimistic outcome in which adding levels (and limiting memory) actually improves the outcome. [sent-493, score-0.324]
97 We conjecture that our data contains enough information for clustering even on chunks that fit in small buffers, and therefore the results may reflect the benefit of the hierarchical implementation. [sent-494, score-0.213]
98 We thank Sanjoy Dasgupta for suggesting the study of approximation algorithms for k-means in the streaming setting, for excellent lecture notes, and for helpful discussions. [sent-496, score-0.531]
99 von Luxburg: Relating clustering stability to properties of cluster boundaries. [sent-530, score-0.279]
100 COLT, 2008 [CCP03] Moses Charikar and Liadan O’Callaghan and Rina Panigrahy: Better streaming algorithms for clustering problems. [sent-531, score-0.659]
wordName wordTfidf (topN-words)
[('cop', 0.415), ('streaming', 0.398), ('centers', 0.264), ('gmmm', 0.257), ('clustering', 0.213), ('lloyd', 0.208), ('seeding', 0.178), ('conquer', 0.178), ('batch', 0.147), ('xc', 0.141), ('ar', 0.116), ('memory', 0.116), ('ol', 0.104), ('bl', 0.104), ('divide', 0.098), ('ai', 0.097), ('km', 0.091), ('br', 0.089), ('buffer', 0.086), ('approximation', 0.085), ('round', 0.082), ('lloyds', 0.079), ('vassilvitskii', 0.079), ('log', 0.078), ('pass', 0.074), ('cloud', 0.074), ('uncovered', 0.074), ('guha', 0.069), ('charikar', 0.069), ('mr', 0.067), ('cluster', 0.066), ('focs', 0.063), ('algorithm', 0.063), ('center', 0.06), ('batches', 0.059), ('callaghan', 0.059), ('kmnp', 0.059), ('cost', 0.059), ('lemma', 0.055), ('nk', 0.054), ('levels', 0.052), ('facility', 0.052), ('spambase', 0.052), ('outputted', 0.052), ('arthur', 0.051), ('kmeans', 0.051), ('algorithms', 0.048), ('clusters', 0.048), ('xu', 0.044), ('ti', 0.044), ('mi', 0.042), ('agkm', 0.04), ('deshpande', 0.04), ('dnk', 0.04), ('liadan', 0.04), ('meyerson', 0.04), ('panigrahy', 0.04), ('sergei', 0.04), ('strategy', 0.038), ('competitive', 0.037), ('event', 0.036), ('subsection', 0.035), ('cbb', 0.035), ('ailon', 0.035), ('sudipto', 0.035), ('aggarwal', 0.035), ('moses', 0.035), ('buffers', 0.035), ('get', 0.034), ('tradeoff', 0.034), ('bi', 0.034), ('run', 0.033), ('running', 0.033), ('subset', 0.033), ('points', 0.032), ('chosen', 0.032), ('online', 0.032), ('sanjoy', 0.032), ('guarantee', 0.031), ('randomized', 0.031), ('objective', 0.031), ('outputs', 0.031), ('covered', 0.03), ('kannan', 0.03), ('popularly', 0.03), ('item', 0.028), ('rounds', 0.028), ('operate', 0.028), ('nir', 0.028), ('pr', 0.027), ('ak', 0.027), ('rd', 0.027), ('potential', 0.027), ('streams', 0.027), ('distances', 0.027), ('choose', 0.026), ('lemmas', 0.026), ('denotes', 0.026), ('spam', 0.026), ('sw', 0.026)]
simIndex simValue paperId paperTitle
same-paper 1 1.0000001 234 nips-2009-Streaming k-means approximation
Author: Nir Ailon, Ragesh Jaiswal, Claire Monteleoni
Abstract: We provide a clustering algorithm that approximately optimizes the k-means objective, in the one-pass streaming setting. We make no assumptions about the data, and our algorithm is very light-weight in terms of memory, and computation. This setting is applicable to unsupervised learning on massive data sets, or resource-constrained devices. The two main ingredients of our theoretical work are: a derivation of an extremely simple pseudo-approximation batch algorithm for k-means (based on the recent k-means++), in which the algorithm is allowed to output more than k centers, and a streaming clustering algorithm in which batch clustering algorithms are performed on small inputs (fitting in memory) and combined in a hierarchical manner. Empirical evaluations on real and simulated data reveal the practical utility of our method. 1
2 0.10295461 126 nips-2009-Learning Bregman Distance Functions and Its Application for Semi-Supervised Clustering
Author: Lei Wu, Rong Jin, Steven C. Hoi, Jianke Zhu, Nenghai Yu
Abstract: Learning distance functions with side information plays a key role in many machine learning and data mining applications. Conventional approaches often assume a Mahalanobis distance function. These approaches are limited in two aspects: (i) they are computationally expensive (even infeasible) for high dimensional data because the size of the metric is in the square of dimensionality; (ii) they assume a fixed metric for the entire input space and therefore are unable to handle heterogeneous data. In this paper, we propose a novel scheme that learns nonlinear Bregman distance functions from side information using a nonparametric approach that is similar to support vector machines. The proposed scheme avoids the assumption of fixed metric by implicitly deriving a local distance from the Hessian matrix of a convex function that is used to generate the Bregman distance function. We also present an efficient learning algorithm for the proposed scheme for distance function learning. The extensive experiments with semi-supervised clustering show the proposed technique (i) outperforms the state-of-the-art approaches for distance function learning, and (ii) is computationally efficient for high dimensional data. 1
3 0.086623393 9 nips-2009-A Game-Theoretic Approach to Hypergraph Clustering
Author: Samuel R. Bulò, Marcello Pelillo
Abstract: Hypergraph clustering refers to the process of extracting maximally coherent groups from a set of objects using high-order (rather than pairwise) similarities. Traditional approaches to this problem are based on the idea of partitioning the input data into a user-defined number of classes, thereby obtaining the clusters as a by-product of the partitioning process. In this paper, we provide a radically different perspective to the problem. In contrast to the classical approach, we attempt to provide a meaningful formalization of the very notion of a cluster and we show that game theory offers an attractive and unexplored perspective that serves well our purpose. Specifically, we show that the hypergraph clustering problem can be naturally cast into a non-cooperative multi-player “clustering game”, whereby the notion of a cluster is equivalent to a classical game-theoretic equilibrium concept. From the computational viewpoint, we show that the problem of finding the equilibria of our clustering game is equivalent to locally optimizing a polynomial function over the standard simplex, and we provide a discrete-time dynamics to perform this optimization. Experiments are presented which show the superiority of our approach over state-of-the-art hypergraph clustering techniques.
4 0.079410143 186 nips-2009-Parallel Inference for Latent Dirichlet Allocation on Graphics Processing Units
Author: Feng Yan, Ningyi Xu, Yuan Qi
Abstract: The recent emergence of Graphics Processing Units (GPUs) as general-purpose parallel computing devices provides us with new opportunities to develop scalable learning methods for massive data. In this work, we consider the problem of parallelizing two inference methods on GPUs for latent Dirichlet Allocation (LDA) models, collapsed Gibbs sampling (CGS) and collapsed variational Bayesian (CVB). To address limited memory constraints on GPUs, we propose a novel data partitioning scheme that effectively reduces the memory cost. This partitioning scheme also balances the computational cost on each multiprocessor and enables us to easily avoid memory access conflicts. We use data streaming to handle extremely large datasets. Extensive experiments showed that our parallel inference methods consistently produced LDA models with the same predictive power as sequential training methods did but with 26x speedup for CGS and 196x speedup for CVB on a GPU with 30 multiprocessors. The proposed partitioning scheme and data streaming make our approach scalable with more multiprocessors. Furthermore, they can be used as general techniques to parallelize other machine learning models. 1
5 0.072245009 252 nips-2009-Unsupervised Feature Selection for the $k$-means Clustering Problem
Author: Christos Boutsidis, Petros Drineas, Michael W. Mahoney
Abstract: We present a novel feature selection algorithm for the k-means clustering problem. Our algorithm is randomized and, assuming an accuracy parameter ϵ ∈ (0, 1), selects and appropriately rescales in an unsupervised manner Θ(k log(k/ϵ)/ϵ2 ) features from a dataset of arbitrary dimensions. We prove that, if we run any γ-approximate k-means algorithm (γ ≥ 1) on the features selected using our method, we can find a (1 + (1 + ϵ)γ)-approximate partition with high probability. 1
6 0.065004788 233 nips-2009-Streaming Pointwise Mutual Information
7 0.062033847 122 nips-2009-Label Selection on Graphs
8 0.05747129 228 nips-2009-Speeding up Magnetic Resonance Image Acquisition by Bayesian Multi-Slice Adaptive Compressed Sensing
9 0.054514211 141 nips-2009-Local Rules for Global MAP: When Do They Work ?
10 0.051757213 235 nips-2009-Structural inference affects depth perception in the context of potential occlusion
11 0.051741149 102 nips-2009-Graph-based Consensus Maximization among Multiple Supervised and Unsupervised Models
12 0.051713817 182 nips-2009-Optimal Scoring for Unsupervised Learning
13 0.051284295 1 nips-2009-$L 1$-Penalized Robust Estimation for a Class of Inverse Problems Arising in Multiview Geometry
14 0.050808996 129 nips-2009-Learning a Small Mixture of Trees
15 0.046212763 155 nips-2009-Modelling Relational Data using Bayesian Clustered Tensor Factorization
16 0.044285093 225 nips-2009-Sparsistent Learning of Varying-coefficient Models with Structural Changes
17 0.044212669 45 nips-2009-Beyond Convexity: Online Submodular Minimization
18 0.044090364 191 nips-2009-Positive Semidefinite Metric Learning with Boosting
19 0.043506544 139 nips-2009-Linear-time Algorithms for Pairwise Statistical Problems
20 0.042644493 229 nips-2009-Statistical Analysis of Semi-Supervised Learning: The Limit of Infinite Unlabelled Data
topicId topicWeight
[(0, -0.147), (1, 0.058), (2, 0.012), (3, -0.03), (4, 0.002), (5, 0.003), (6, -0.021), (7, -0.005), (8, 0.01), (9, 0.007), (10, -0.069), (11, -0.06), (12, 0.092), (13, -0.015), (14, 0.004), (15, -0.025), (16, 0.088), (17, -0.086), (18, -0.051), (19, -0.028), (20, -0.097), (21, -0.006), (22, 0.021), (23, -0.054), (24, 0.074), (25, -0.052), (26, -0.07), (27, -0.007), (28, 0.002), (29, -0.022), (30, 0.008), (31, -0.075), (32, 0.034), (33, -0.005), (34, 0.068), (35, 0.013), (36, 0.031), (37, 0.04), (38, 0.063), (39, 0.045), (40, 0.072), (41, 0.032), (42, 0.028), (43, 0.056), (44, 0.012), (45, -0.171), (46, 0.005), (47, 0.001), (48, 0.076), (49, -0.047)]
simIndex simValue paperId paperTitle
same-paper 1 0.92084301 234 nips-2009-Streaming k-means approximation
Author: Nir Ailon, Ragesh Jaiswal, Claire Monteleoni
Abstract: We provide a clustering algorithm that approximately optimizes the k-means objective, in the one-pass streaming setting. We make no assumptions about the data, and our algorithm is very light-weight in terms of memory, and computation. This setting is applicable to unsupervised learning on massive data sets, or resource-constrained devices. The two main ingredients of our theoretical work are: a derivation of an extremely simple pseudo-approximation batch algorithm for k-means (based on the recent k-means++), in which the algorithm is allowed to output more than k centers, and a streaming clustering algorithm in which batch clustering algorithms are performed on small inputs (fitting in memory) and combined in a hierarchical manner. Empirical evaluations on real and simulated data reveal the practical utility of our method. 1
2 0.63354069 9 nips-2009-A Game-Theoretic Approach to Hypergraph Clustering
Author: Samuel R. Bulò, Marcello Pelillo
Abstract: Hypergraph clustering refers to the process of extracting maximally coherent groups from a set of objects using high-order (rather than pairwise) similarities. Traditional approaches to this problem are based on the idea of partitioning the input data into a user-defined number of classes, thereby obtaining the clusters as a by-product of the partitioning process. In this paper, we provide a radically different perspective to the problem. In contrast to the classical approach, we attempt to provide a meaningful formalization of the very notion of a cluster and we show that game theory offers an attractive and unexplored perspective that serves well our purpose. Specifically, we show that the hypergraph clustering problem can be naturally cast into a non-cooperative multi-player “clustering game”, whereby the notion of a cluster is equivalent to a classical game-theoretic equilibrium concept. From the computational viewpoint, we show that the problem of finding the equilibria of our clustering game is equivalent to locally optimizing a polynomial function over the standard simplex, and we provide a discrete-time dynamics to perform this optimization. Experiments are presented which show the superiority of our approach over state-of-the-art hypergraph clustering techniques.
3 0.59757024 182 nips-2009-Optimal Scoring for Unsupervised Learning
Author: Zhihua Zhang, Guang Dai
Abstract: We are often interested in casting classification and clustering problems as a regression framework, because it is feasible to achieve some statistical properties in this framework by imposing some penalty criteria. In this paper we illustrate optimal scoring, which was originally proposed for performing the Fisher linear discriminant analysis by regression, in the application of unsupervised learning. In particular, we devise a novel clustering algorithm that we call optimal discriminant clustering. We associate our algorithm with the existing unsupervised learning algorithms such as spectral clustering, discriminative clustering and sparse principal component analysis. Experimental results on a collection of benchmark datasets validate the effectiveness of the optimal discriminant clustering algorithm.
4 0.58750021 252 nips-2009-Unsupervised Feature Selection for the $k$-means Clustering Problem
Author: Christos Boutsidis, Petros Drineas, Michael W. Mahoney
Abstract: We present a novel feature selection algorithm for the k-means clustering problem. Our algorithm is randomized and, assuming an accuracy parameter ϵ ∈ (0, 1), selects and appropriately rescales in an unsupervised manner Θ(k log(k/ϵ)/ϵ2 ) features from a dataset of arbitrary dimensions. We prove that, if we run any γ-approximate k-means algorithm (γ ≥ 1) on the features selected using our method, we can find a (1 + (1 + ϵ)γ)-approximate partition with high probability. 1
5 0.52859646 102 nips-2009-Graph-based Consensus Maximization among Multiple Supervised and Unsupervised Models
Author: Jing Gao, Feng Liang, Wei Fan, Yizhou Sun, Jiawei Han
Abstract: Ensemble classifiers such as bagging, boosting and model averaging are known to have improved accuracy and robustness over a single model. Their potential, however, is limited in applications which have no access to raw data but to the meta-level model output. In this paper, we study ensemble learning with output from multiple supervised and unsupervised models, a topic where little work has been done. Although unsupervised models, such as clustering, do not directly generate label prediction for each individual, they provide useful constraints for the joint prediction of a set of related objects. We propose to consolidate a classification solution by maximizing the consensus among both supervised predictions and unsupervised constraints. We cast this ensemble task as an optimization problem on a bipartite graph, where the objective function favors the smoothness of the prediction over the graph, as well as penalizing deviations from the initial labeling provided by supervised models. We solve this problem through iterative propagation of probability estimates among neighboring nodes. Our method can also be interpreted as conducting a constrained embedding in a transformed space, or a ranking on the graph. Experimental results on three real applications demonstrate the benefits of the proposed method over existing alternatives1 . 1
6 0.51737845 233 nips-2009-Streaming Pointwise Mutual Information
7 0.51548642 74 nips-2009-Efficient Bregman Range Search
8 0.4966943 126 nips-2009-Learning Bregman Distance Functions and Its Application for Semi-Supervised Clustering
9 0.44840977 122 nips-2009-Label Selection on Graphs
10 0.42476436 249 nips-2009-Tracking Dynamic Sources of Malicious Activity at Internet Scale
11 0.42055851 51 nips-2009-Clustering sequence sets for motif discovery
12 0.41884133 63 nips-2009-DUOL: A Double Updating Approach for Online Learning
13 0.41184312 198 nips-2009-Rank-Approximate Nearest Neighbor Search: Retaining Meaning and Speed in High Dimensions
14 0.39780822 48 nips-2009-Bootstrapping from Game Tree Search
15 0.39604893 239 nips-2009-Submodularity Cuts and Applications
16 0.39424649 191 nips-2009-Positive Semidefinite Metric Learning with Boosting
17 0.37966764 160 nips-2009-Multiple Incremental Decremental Learning of Support Vector Machines
18 0.37246376 193 nips-2009-Potential-Based Agnostic Boosting
19 0.37144101 40 nips-2009-Bayesian Nonparametric Models on Decomposable Graphs
20 0.36612096 82 nips-2009-Entropic Graph Regularization in Non-Parametric Semi-Supervised Classification
topicId topicWeight
[(24, 0.095), (25, 0.063), (35, 0.031), (36, 0.088), (37, 0.368), (39, 0.058), (58, 0.06), (61, 0.013), (71, 0.048), (81, 0.02), (86, 0.057), (91, 0.017)]
simIndex simValue paperId paperTitle
1 0.84163541 235 nips-2009-Structural inference affects depth perception in the context of potential occlusion
Author: Ian Stevenson, Konrad Koerding
Abstract: In many domains, humans appear to combine perceptual cues in a near-optimal, probabilistic fashion: two noisy pieces of information tend to be combined linearly with weights proportional to the precision of each cue. Here we present a case where structural information plays an important role. The presence of a background cue gives rise to the possibility of occlusion, and places a soft constraint on the location of a target - in effect propelling it forward. We present an ideal observer model of depth estimation for this situation where structural or ordinal information is important and then fit the model to human data from a stereo-matching task. To test whether subjects are truly using ordinal cues in a probabilistic manner we then vary the uncertainty of the task. We find that the model accurately predicts shifts in subject’s behavior. Our results indicate that the nervous system estimates depth ordering in a probabilistic fashion and estimates the structure of the visual scene during depth perception. 1
same-paper 2 0.71836036 234 nips-2009-Streaming k-means approximation
Author: Nir Ailon, Ragesh Jaiswal, Claire Monteleoni
Abstract: We provide a clustering algorithm that approximately optimizes the k-means objective, in the one-pass streaming setting. We make no assumptions about the data, and our algorithm is very light-weight in terms of memory, and computation. This setting is applicable to unsupervised learning on massive data sets, or resource-constrained devices. The two main ingredients of our theoretical work are: a derivation of an extremely simple pseudo-approximation batch algorithm for k-means (based on the recent k-means++), in which the algorithm is allowed to output more than k centers, and a streaming clustering algorithm in which batch clustering algorithms are performed on small inputs (fitting in memory) and combined in a hierarchical manner. Empirical evaluations on real and simulated data reveal the practical utility of our method. 1
3 0.68610531 12 nips-2009-A Generalized Natural Actor-Critic Algorithm
Author: Tetsuro Morimura, Eiji Uchibe, Junichiro Yoshimoto, Kenji Doya
Abstract: Policy gradient Reinforcement Learning (RL) algorithms have received substantial attention, seeking stochastic policies that maximize the average (or discounted cumulative) reward. In addition, extensions based on the concept of the Natural Gradient (NG) show promising learning efficiency because these regard metrics for the task. Though there are two candidate metrics, Kakade’s Fisher Information Matrix (FIM) for the policy (action) distribution and Morimura’s FIM for the stateaction joint distribution, but all RL algorithms with NG have followed Kakade’s approach. In this paper, we describe a generalized Natural Gradient (gNG) that linearly interpolates the two FIMs and propose an efficient implementation for the gNG learning based on a theory of the estimating function, the generalized Natural Actor-Critic (gNAC) algorithm. The gNAC algorithm involves a near optimal auxiliary function to reduce the variance of the gNG estimates. Interestingly, the gNAC can be regarded as a natural extension of the current state-of-the-art NAC algorithm [1], as long as the interpolating parameter is appropriately selected. Numerical experiments showed that the proposed gNAC algorithm can estimate gNG efficiently and outperformed the NAC algorithm.
4 0.64879656 231 nips-2009-Statistical Models of Linear and Nonlinear Contextual Interactions in Early Visual Processing
Author: Ruben Coen-cagli, Peter Dayan, Odelia Schwartz
Abstract: A central hypothesis about early visual processing is that it represents inputs in a coordinate system matched to the statistics of natural scenes. Simple versions of this lead to Gabor–like receptive fields and divisive gain modulation from local surrounds; these have led to influential neural and psychological models of visual processing. However, these accounts are based on an incomplete view of the visual context surrounding each point. Here, we consider an approximate model of linear and non–linear correlations between the responses of spatially distributed Gaborlike receptive fields, which, when trained on an ensemble of natural scenes, unifies a range of spatial context effects. The full model accounts for neural surround data in primary visual cortex (V1), provides a statistical foundation for perceptual phenomena associated with Li’s (2002) hypothesis that V1 builds a saliency map, and fits data on the tilt illusion. 1
5 0.57934964 22 nips-2009-Accelerated Gradient Methods for Stochastic Optimization and Online Learning
Author: Chonghai Hu, Weike Pan, James T. Kwok
Abstract: Regularized risk minimization often involves non-smooth optimization, either because of the loss function (e.g., hinge loss) or the regularizer (e.g., ℓ1 -regularizer). Gradient methods, though highly scalable and easy to implement, are known to converge slowly. In this paper, we develop a novel accelerated gradient method for stochastic optimization while still preserving their computational simplicity and scalability. The proposed algorithm, called SAGE (Stochastic Accelerated GradiEnt), exhibits fast convergence rates on stochastic composite optimization with convex or strongly convex objectives. Experimental results show that SAGE is faster than recent (sub)gradient methods including FOLOS, SMIDAS and SCD. Moreover, SAGE can also be extended for online learning, resulting in a simple algorithm but with the best regret bounds currently known for these problems. 1
6 0.4579308 126 nips-2009-Learning Bregman Distance Functions and Its Application for Semi-Supervised Clustering
7 0.43745542 162 nips-2009-Neural Implementation of Hierarchical Bayesian Inference by Importance Sampling
8 0.43265215 116 nips-2009-Information-theoretic lower bounds on the oracle complexity of convex optimization
9 0.4319306 122 nips-2009-Label Selection on Graphs
10 0.42741972 239 nips-2009-Submodularity Cuts and Applications
11 0.42735189 169 nips-2009-Nonlinear Learning using Local Coordinate Coding
12 0.42519593 59 nips-2009-Construction of Nonparametric Bayesian Models from Parametric Bayes Equations
13 0.42384475 91 nips-2009-Fast, smooth and adaptive regression in metric spaces
14 0.42250302 129 nips-2009-Learning a Small Mixture of Trees
15 0.42184237 71 nips-2009-Distribution-Calibrated Hierarchical Classification
16 0.42116293 207 nips-2009-Robust Nonparametric Regression with Metric-Space Valued Output
17 0.42080888 45 nips-2009-Beyond Convexity: Online Submodular Minimization
18 0.42078814 9 nips-2009-A Game-Theoretic Approach to Hypergraph Clustering
19 0.42055824 260 nips-2009-Zero-shot Learning with Semantic Output Codes
20 0.4197402 20 nips-2009-A unified framework for high-dimensional analysis of $M$-estimators with decomposable regularizers