nips nips2013 nips2013-94 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Maria-Florina Balcan, Steven Ehrlich, Yingyu Liang
Abstract: This paper provides new algorithms for distributed clustering for two popular center-based objectives, k-median and k-means. These algorithms have provable guarantees and improve communication complexity over existing approaches. Following a classic approach in clustering by [13], we reduce the problem of finding a clustering with low cost to the problem of finding a coreset of small size. We provide a distributed method for constructing a global coreset which improves over the previous methods by reducing the communication complexity, and which works over general communication topologies. Experimental results on large scale data sets show that this approach outperforms other coreset-based distributed clustering algorithms. 1
Reference: text
sentIndex sentText sentNum sentScore
1 Following a classic approach in clustering by [13], we reduce the problem of finding a clustering with low cost to the problem of finding a coreset of small size. [sent-6, score-1.139]
2 We provide a distributed method for constructing a global coreset which improves over the previous methods by reducing the communication complexity, and which works over general communication topologies. [sent-7, score-1.326]
3 1 Introduction Most classic clustering algorithms are designed for the centralized setting, but in recent years data has become distributed over different locations, such as distributed databases [21, 5], images and videos over networks [20], surveillance [11] and sensor networks [4, 12]. [sent-9, score-0.458]
4 Additionally, most of these algorithms assume that the distributed nodes can communicate with all other sites or that there is a central coordinator that communicates with all other sites. [sent-16, score-0.364]
5 In this paper, we study the problem of distributed clustering where the data is distributed across nodes whose communication is restricted to the edges of an arbitrary graph. [sent-17, score-0.554]
6 We provide algorithms with small communication cost and provable guarantees on the clustering quality. [sent-18, score-0.473]
7 Our technique for reducing communication in general graphs is based on the construction of a small set of points which act as a proxy for the entire data set. [sent-19, score-0.355]
8 An -coreset is a weighted set of points whose cost on any set of centers is approximately the cost of the original data on those same centers up to accuracy . [sent-20, score-0.623]
9 Thus an approximate solution for the coreset is also an approximate solution for the original data. [sent-21, score-0.831]
10 [23] 5 (b) Our Construction Figure 1: (a) Each node computes a coreset on the weighted pointset for its own data and its subtrees’ coresets. [sent-24, score-0.867]
11 Communicating the approximate cost of a global solution to each node is enough for the local construction, leading to low communication cost overall. [sent-27, score-0.718]
12 More precisely, in Section 3, we propose a distributed coreset construction algorithm based on local approximate solutions. [sent-29, score-1.051]
13 Each node computes an approximate solution for its local data, and then constructs the local portion of a coreset using only its local data and the total cost of each node’s ap˜ proximation. [sent-30, score-1.306]
14 For constant, this builds a coreset of size O(kd+nk) for k-median and k-means when the data lies in d dimensions and is distributed over n sites. [sent-31, score-0.896]
15 If there is a central coordinator among the n sites, then clustering can be performed on the coordinator by collecting the local portions of ˜ the coreset with a communication cost equal to the coreset size O(kd + nk). [sent-32, score-2.337]
16 For distributed clustering over general connected topologies, we propose an algorithm based on the distributed coreset construction and a message-passing approach, whose communication cost improves over previous coreset-based algorithms. [sent-33, score-1.545]
17 For a fixed amount of communication, our algorithm outperforms other coreset construction algorithms. [sent-36, score-0.852]
18 Comparison to Other Coreset Algorithms: Since coresets summarize local information they are a natural tool to use when trying to reduce communication complexity. [sent-37, score-0.517]
19 If each node constructs an coreset on its local data, then the union of these coresets is clearly an -coreset for the entire data set. [sent-38, score-1.217]
20 Unfortunately the size of the coreset in this approach increases greatly with the number of nodes. [sent-39, score-0.785]
21 Its main idea is to approximate the union of local coresets with another coreset. [sent-41, score-0.351]
22 They assume nodes communicate over a rooted tree, with each node passing its coreset to its parent. [sent-42, score-0.94]
23 Because the approximation factor of the constructed coreset depends on the quality of its component coresets, the accuracy a coreset needs (and thus the overall communication complexity) scales with the height of this tree. [sent-43, score-1.798]
24 Although it is possible to find a spanning tree in any communication network, when the graph has large diameter every tree has large height. [sent-44, score-0.352]
25 We show that it is possible to construct a global coreset with low communication overhead. [sent-46, score-1.008]
26 This is done by distributing the coreset construction procedure rather than combining local coresets. [sent-47, score-0.913]
27 The communication needed to construct this coreset is negligible – just a single value from each data set representing the approximate cost of their local optimal clustering. [sent-48, score-1.23]
28 Since the sampled global -coreset is the same size as any local -coreset, this leads to an improvement of the communication cost over the other approaches. [sent-49, score-0.474]
29 The constructed coreset is smaller by a factor of n in general graphs, and is independent of the communication topology. [sent-51, score-0.979]
30 [9] also merge coresets using coreset construction, but they do so in a model of parallel computation and ignore communication costs. [sent-53, score-1.213]
31 Here we measure the communication cost in number of points transmitted, and assume for simplicity that there is no latency in the communication. [sent-70, score-0.415]
32 The goal is to find a set of k centers x which optimize cost(P, x) while keeping the computation efficient and the communication cost as low as possible. [sent-72, score-0.475]
33 Our focus is to reduce the communication cost while preserving theoretical guarantees for approximating clustering cost. [sent-73, score-0.473]
34 In the centralized setting, the idea of summarization with respect to the clustering task is captured by the concept of coresets [13, 8]. [sent-76, score-0.399]
35 A coreset is a set of weighted points whose cost approximates the cost of the original data for any set of k centers. [sent-77, score-1.184]
36 An -coreset for a set of points P with respect to a center-based cost function is a set of points S and a set of weights w : S → R such that for any set of centers x, we have (1 − )cost(P, x) ≤ p∈S w(p)cost(p, x) ≤ (1 + )cost(P, x). [sent-79, score-0.373]
37 In the centralized setting, many coreset construction algorithms have been proposed for k-median, k-means and some other cost functions. [sent-80, score-1.062]
38 For example, for points in Rd , algorithms in [8] construct ˜ ˜ coresets of size O(kd/ 4 ) for k-means and coresets of size O(kd/ 2 ) for k-median. [sent-81, score-0.513]
39 In the distributed setting, it is natural to ask whether there exists an algorithm that constructs a small coreset for the entire point set but still has low communication cost. [sent-82, score-1.16]
40 Note that the union of coresets for multiple data sets is a coreset for the union of the data sets. [sent-83, score-1.061]
41 The immediate construction of combining the local coresets from each node would produce a global coreset whose size was larger by a factor of n, greatly increasing the communication complexity. [sent-84, score-1.45]
42 We present a distributed algorithm which constructs a global coreset the same size as the centralized construction and only needs a single value1 communicated to each node. [sent-85, score-1.109]
43 3 Distributed Coreset Construction Here we design a distributed coreset construction algorithm for k-means and k-median. [sent-87, score-0.962]
44 To gain some intuition on the distributed coreset construction algorithm, we briefly review the construction algorithm in [8] in the centralized setting. [sent-89, score-1.091]
45 The coreset is constructed by computing a constant approximation solution for the entire data set, and then sampling points proportional to their contributions to the cost of this solution. [sent-90, score-1.05]
46 3 Algorithm 1 Communication aware distributed coreset construction Input: Local datasets {Pi , 1 ≤ i ≤ n}, parameter t (number of points to be sampled). [sent-95, score-0.993]
47 We show that a global coreset can be constructed in a distributed fashion by estimating the weight of the entire data set with the sum of local approximations. [sent-104, score-1.002]
48 The size of the coreset is O( 1 (kd+log 1 )+nk log nk ) for k-means, and O( 1 (kd+ 4 2 δ δ 1 log δ ) + nk) for k-median. [sent-108, score-0.893]
49 As described below, the distributed coreset construction can be achieved by using Algorithm 1 with appropriate t, namely O( 1 (kd + log 1 ) + nk log nk ) for k-means and O( 1 (kd + log 1 )) for k4 2 δ δ δ median. [sent-110, score-1.174]
50 Then for with probability z∈P mz any f ∈ F , the expectation of the weighted cost of S equals the cost of the original data P , since E q∈S wq f (q) = q∈S E[wq f (q)] = q∈S p∈P Pr[q = p]wp f (p) = p∈P f (p). [sent-119, score-0.641]
51 We first consider the centralized setting and review how [8] applied the lemma to construct a coreset for k-median as in Definition 1. [sent-132, score-0.864]
52 A natural approach is to apply this lemma directly to the cost fx (p) := cost(p, x). [sent-133, score-0.343]
53 Aiming to approximate 4 the error p [cost(p, x) − cost(bp , x)] rather than to approximate p cost(p, x) directly, we define fx (p) := cost(p, x) − cost(bp , x) + cost(p, bp ), where cost(p, bp ) is added so that fx (p) ≥ 0. [sent-137, score-0.662]
54 Since 0 ≤ fx (p) ≤ 2cost(p, bp ), we can apply the lemma with mp = 2cost(p, bp ). [sent-138, score-0.645]
55 It bounds the difference | p∈P fx (p) − q∈S wq fx (q)| by 2 p∈P cost(p, bp ), so we have an O( )-approximation. [sent-139, score-0.689]
56 Note that p∈P fx (p) − q∈S wq fx (q) does not equal p∈P cost(p, x) − q∈S wq cost(q, x). [sent-140, score-0.744]
57 However, it equals the difference between p∈P cost(p, x) and a weighted cost of the sampled points and the centers in the approximation solution. [sent-141, score-0.381]
58 To get a coreset as in Definition 1, we need to add the centers of the approximation solution with specific weights to the coreset. [sent-142, score-0.933]
59 Our key contribution in this paper is to show that in the distributed setting, it suffices to choose bp from the local approximation solution for the local dataset containing p, rather than from an approximation solution for the global dataset. [sent-144, score-0.538]
60 Furthermore, the sampling and the weighting of the coreset points can be done in a local manner. [sent-145, score-0.891]
61 We have the following lemma for k-median with F = {fx : fx (p) = d(p, x) − d(bp , x) + d(p, bp ), x ∈ (Rd )k }. [sent-147, score-0.348]
62 2 δ Proof Sketch of Lemma 2: We want to show that for any set of centers x the true cost for using these centers is well approximated by the cost on the weighted coreset. [sent-150, score-0.578]
63 Note that our coreset has two z∈P m types of points: sampled points q ∈ S = ∪n Si with weight wq := mq |S| z and local solution i=1 centers b ∈ B = ∪n Bi with weight wb := |Pb | − q∈S∩Pb wq . [sent-151, score-1.509]
64 As mentioned above, we construct fx (p) to be the difference between the cost of p and the cost of bp so that Lemma 1 can be applied. [sent-154, score-0.641]
65 Note that the centers are weighted such that b∈B wb d(b, x) = b∈B |Pb |d(b, x) − b∈B q∈S∩Pb wq d(b, x) = p∈P d(bp , x) − wq d(bq , x). [sent-155, score-0.624]
66 Taken together with the fact that p∈P mp = wq mq , we can show q∈S q∈S that p∈P d(p, x) − q∈S∪B wq d(q, x) = p∈P fx (p) − q∈S wq fx (q) . [sent-156, score-1.118]
67 Note that 0 ≤ fx (p) ≤ 2d(p, bp ) by triangle inequality, and S is sufficiently large and chosen according to weights mp = d(p, bp ), so the conditions of Lemma 1 are met. [sent-157, score-0.65]
68 Therefore, by Lemma 2, when |S| ≥ O 1 (kd + log 1 ) , the weighted cost of S ∪ B approximates the k-median cost of P for any set 2 δ of centers, then (S ∪ B, w) becomes an -coreset for P . [sent-160, score-0.386]
69 The total communication cost is bounded by O(mn), since even in the most general case that every node only knows its neighbors, we can broadcast the local costs with O(mn) communication (see Algorithm 3). [sent-161, score-0.738]
70 It also bounds (C), noting that E[ q∈Pb ∩S wq ] = |Pb |, and thus q∈Pb ∩S wq ≤ 2|Pb | when t ≥ O(nk log nk ). [sent-181, score-0.548]
71 4 Effect of Network Topology on Communication Cost If there is a central coordinator in the communication graph, then we can run distributed coreset construction algorithm and send the local portions of the coreset to the coordinator, which can perform the clustering task. [sent-183, score-2.297]
72 The total communication cost is just the size of the coreset. [sent-184, score-0.37]
73 We propose to use a message passing approach for collecting information for coreset construction and sharing the local portions of the coreset. [sent-186, score-1.01]
74 Since each piece of the coreset is shared at most twice across any particular edge in message passing, we have Theorem 2. [sent-188, score-0.796]
75 The communication cost is O(m( 1 (kd + log 1 ) + nk log nk )) for k-means, and O(m( 1 (kd + log 1 ) + nk)) for k-median. [sent-190, score-0.596]
76 4 2 δ δ δ In contrast, an approach where each node constructs an -coreset for k-means and sends it to the ˜ other nodes incurs communication cost of O( mnkd ). [sent-191, score-0.485]
77 4 Our algorithm can also be applied on a rooted tree: we can send the coreset portions to the root which then applies an approximation algorithm. [sent-193, score-0.91]
78 Given an α-approximation algorithm for weighted k-means (k-median respectively) as a subroutine, there exists an algorithm that with probability at least 1 − δ outputs a (1 + )αapproximation solution for distributed k-means (k-median respectively) clustering on a rooted tree of height h. [sent-195, score-0.38]
79 The total communication cost is O(h( 1 (kd + log 1 ) + nk log nk )) for k-means, and 4 δ δ O(h( 1 (kd + log 1 ) + nk)) for k-median. [sent-196, score-0.596]
80 2 δ 4 2 ˜ ˜ Our approach improves the cost of O( nh 4kd ) for k-means and the cost of O( nh 2kd ) for k-median in [23] 2 . [sent-197, score-0.354]
81 The algorithm in [23] builds on each node a coreset for the union of coresets from its 2 Their algorithm used coreset construction as a subroutine. [sent-198, score-1.966]
82 The construction algorithm they used builds ˜ d coreset of size O( nkh log |P |). [sent-199, score-0.885]
83 Throughout this paper, when we compare to [23] we assume they use the coreset construction technique of [8] to reduce their coreset size and communication cost. [sent-200, score-1.817]
84 Since the coreset construction subroutine has quadratic dependence on 1/ for k-median (quartic for k-means), the algorithm then has quadratic dependence on h (quartic for k-means). [sent-202, score-0.874]
85 Our algorithm does not build coreset on top of coresets, resulting in a better dependence on the height of the tree h. [sent-203, score-0.835]
86 5 Experiments Here we evaluate the effectiveness of our algorithm and compare it to other distributed coreset algorithms. [sent-207, score-0.895]
87 We present the k-means cost of the solution by our algorithm with varying communication cost, and compare to those of other algorithms when they use the same amount of communication. [sent-208, score-0.4]
88 Experimental Methodology: We first generate a communication graph connecting local sites, and then partition the data into local data sets. [sent-211, score-0.434]
89 To measure the quality of the coreset generated, we run Lloyd’s algorithm on the coreset and the global data respectively to get two solutions, and compute the ratio between the costs of the two solutions over the global data. [sent-219, score-1.682]
90 We compare our algorithm with COMBINE, the method of combining coresets from local data sets, and with the algorithm of [23] (Zhang et al. [sent-221, score-0.337]
91 We observe that the algorithms perform well with much smaller coreset sizes than predicted by the theoretical bounds. [sent-226, score-0.771]
92 1 cost ratio, the coreset size and thus the communication needed is only 0. [sent-228, score-1.141]
93 Our algorithm then saves 10% − 20% communication cost to achieve the same approximation ratio. [sent-237, score-0.409]
94 This is due to the fact that their algorithm needs larger coresets to prevent the accumulation of errors when constructing coresets from component coresets, and thus needs higher communication cost to achieve the same approximation ratio. [sent-241, score-0.898]
95 4 communication cost (d) grid graph, similarity-based 2. [sent-296, score-0.397]
96 8 ×106 communication cost (f) preferential graph, degree-based Figure 2: k-means cost (normalized by baseline) v. [sent-301, score-0.563]
97 6 communication cost (e) grid graph, weighted 2. [sent-361, score-0.441]
98 8 communication cost ×106 (f) preferential graph, degree-based Figure 3: k-means cost (normalized by baseline) v. [sent-366, score-0.563]
99 communication cost over the spanning trees of the graphs. [sent-368, score-0.396]
100 An effective coreset compression algorithm for large scale sensor networks. [sent-434, score-0.824]
wordName wordTfidf (topN-words)
[('coreset', 0.771), ('coresets', 0.234), ('wq', 0.222), ('communication', 0.208), ('bp', 0.167), ('cost', 0.162), ('fx', 0.15), ('mp', 0.13), ('kd', 0.111), ('distributed', 0.11), ('centers', 0.105), ('clustering', 0.103), ('pb', 0.095), ('coordinator', 0.088), ('nk', 0.086), ('local', 0.075), ('construction', 0.067), ('centralized', 0.062), ('sites', 0.06), ('pi', 0.058), ('node', 0.052), ('mz', 0.051), ('points', 0.045), ('communicate', 0.045), ('weighted', 0.044), ('graph', 0.044), ('constructs', 0.04), ('sensor', 0.039), ('quartic', 0.038), ('vi', 0.038), ('portions', 0.037), ('bi', 0.036), ('ni', 0.034), ('send', 0.034), ('costs', 0.033), ('partition', 0.032), ('lemma', 0.031), ('preferential', 0.031), ('wb', 0.031), ('rooted', 0.029), ('site', 0.029), ('global', 0.029), ('dim', 0.029), ('union', 0.028), ('grid', 0.027), ('tree', 0.027), ('wp', 0.026), ('portion', 0.026), ('symposium', 0.026), ('spanning', 0.026), ('gx', 0.025), ('approximation', 0.025), ('titles', 0.025), ('ri', 0.025), ('message', 0.025), ('balcan', 0.024), ('nodes', 0.023), ('si', 0.023), ('height', 0.023), ('topologies', 0.022), ('mq', 0.022), ('subroutine', 0.022), ('accumulation', 0.021), ('diameter', 0.02), ('triangle', 0.02), ('sketch', 0.02), ('passing', 0.02), ('feldman', 0.019), ('communicates', 0.019), ('zhang', 0.019), ('central', 0.019), ('combine', 0.019), ('objectives', 0.019), ('round', 0.018), ('algo', 0.018), ('topology', 0.018), ('log', 0.018), ('graphs', 0.018), ('transmitted', 0.018), ('solutions', 0.018), ('networks', 0.017), ('ratio', 0.017), ('kannan', 0.017), ('entire', 0.017), ('acm', 0.016), ('solution', 0.016), ('mn', 0.016), ('communicating', 0.016), ('communicated', 0.016), ('proceedings', 0.016), ('weights', 0.016), ('nh', 0.015), ('builds', 0.015), ('collecting', 0.015), ('algorithm', 0.014), ('approximate', 0.014), ('network', 0.014), ('proportional', 0.014), ('annual', 0.014), ('greatly', 0.014)]
simIndex simValue paperId paperTitle
same-paper 1 0.99999988 94 nips-2013-Distributed $k$-means and $k$-median Clustering on General Topologies
Author: Maria-Florina Balcan, Steven Ehrlich, Yingyu Liang
Abstract: This paper provides new algorithms for distributed clustering for two popular center-based objectives, k-median and k-means. These algorithms have provable guarantees and improve communication complexity over existing approaches. Following a classic approach in clustering by [13], we reduce the problem of finding a clustering with low cost to the problem of finding a coreset of small size. We provide a distributed method for constructing a global coreset which improves over the previous methods by reducing the communication complexity, and which works over general communication topologies. Experimental results on large scale data sets show that this approach outperforms other coreset-based distributed clustering algorithms. 1
Author: Yuchen Zhang, John Duchi, Michael Jordan, Martin J. Wainwright
Abstract: We establish lower bounds on minimax risks for distributed statistical estimation under a communication budget. Such lower bounds reveal the minimum amount of communication required by any procedure to achieve the centralized minimax-optimal rates for statistical estimation. We study two classes of protocols: one in which machines send messages independently, and a second allowing for interactive communication. We establish lower bounds for several problems, including various types of location models, as well as for parameter estimation in regression models. 1
3 0.08473888 8 nips-2013-A Graphical Transformation for Belief Propagation: Maximum Weight Matchings and Odd-Sized Cycles
Author: Jinwoo Shin, Andrew E. Gelfand, Misha Chertkov
Abstract: Max-product ‘belief propagation’ (BP) is a popular distributed heuristic for finding the Maximum A Posteriori (MAP) assignment in a joint probability distribution represented by a Graphical Model (GM). It was recently shown that BP converges to the correct MAP assignment for a class of loopy GMs with the following common feature: the Linear Programming (LP) relaxation to the MAP problem is tight (has no integrality gap). Unfortunately, tightness of the LP relaxation does not, in general, guarantee convergence and correctness of the BP algorithm. The failure of BP in such cases motivates reverse engineering a solution – namely, given a tight LP, can we design a ‘good’ BP algorithm. In this paper, we design a BP algorithm for the Maximum Weight Matching (MWM) problem over general graphs. We prove that the algorithm converges to the correct optimum if the respective LP relaxation, which may include inequalities associated with non-intersecting odd-sized cycles, is tight. The most significant part of our approach is the introduction of a novel graph transformation designed to force convergence of BP. Our theoretical result suggests an efficient BP-based heuristic for the MWM problem, which consists of making sequential, “cutting plane”, modifications to the underlying GM. Our experiments show that this heuristic performs as well as traditional cutting-plane algorithms using LP solvers on MWM problems. 1
4 0.071161285 243 nips-2013-Parallel Sampling of DP Mixture Models using Sub-Cluster Splits
Author: Jason Chang, John W. Fisher III
Abstract: We present an MCMC sampler for Dirichlet process mixture models that can be parallelized to achieve significant computational gains. We combine a nonergodic, restricted Gibbs iteration with split/merge proposals in a manner that produces an ergodic Markov chain. Each cluster is augmented with two subclusters to construct likely split moves. Unlike some previous parallel samplers, the proposed sampler enforces the correct stationary distribution of the Markov chain without the need for finite approximations. Empirical results illustrate that the new sampler exhibits better convergence properties than current methods. 1
5 0.067817599 197 nips-2013-Moment-based Uniform Deviation Bounds for $k$-means and Friends
Author: Matus Telgarsky, Sanjoy Dasgupta
Abstract: Suppose k centers are fit to m points by heuristically minimizing the k-means cost; what is the corresponding fit over the source distribution? This question is resolved here for distributions with p 4 bounded moments; in particular, the difference between the sample cost and distribution cost decays with m and p as mmin{ 1/4, 1/2+2/p} . The essential technical contribution is a mechanism to uniformly control deviations in the face of unbounded parameter sets, cost functions, and source distributions. To further demonstrate this mechanism, a soft clustering variant of k-means cost is also considered, namely the log likelihood of a Gaussian mixture, subject to the constraint that all covariance matrices have bounded spectrum. Lastly, a rate with refined constants is provided for k-means instances possessing some cluster structure. 1
6 0.063080288 158 nips-2013-Learning Multiple Models via Regularized Weighting
7 0.0605672 95 nips-2013-Distributed Exploration in Multi-Armed Bandits
8 0.060044199 333 nips-2013-Trading Computation for Communication: Distributed Stochastic Dual Coordinate Ascent
9 0.056217127 97 nips-2013-Distributed Submodular Maximization: Identifying Representative Elements in Massive Data
10 0.048854977 282 nips-2013-Robust Multimodal Graph Matching: Sparse Coding Meets Graph Matching
11 0.047528725 357 nips-2013-k-Prototype Learning for 3D Rigid Structures
12 0.047387797 228 nips-2013-Online Learning of Dynamic Parameters in Social Networks
13 0.044026069 151 nips-2013-Learning Chordal Markov Networks by Constraint Satisfaction
14 0.043973129 238 nips-2013-Optimistic Concurrency Control for Distributed Unsupervised Learning
15 0.043742966 291 nips-2013-Sensor Selection in High-Dimensional Gaussian Trees with Nuisances
16 0.043713946 355 nips-2013-Which Space Partitioning Tree to Use for Search?
17 0.041735619 272 nips-2013-Regularized Spectral Clustering under the Degree-Corrected Stochastic Blockmodel
18 0.039213695 180 nips-2013-Low-rank matrix reconstruction and clustering via approximate message passing
19 0.038760904 63 nips-2013-Cluster Trees on Manifolds
20 0.038141623 52 nips-2013-Bayesian inference as iterated random functions with applications to sequential inference in graphical models
topicId topicWeight
[(0, 0.106), (1, 0.022), (2, 0.01), (3, -0.015), (4, 0.043), (5, 0.07), (6, 0.022), (7, -0.046), (8, -0.005), (9, 0.027), (10, 0.037), (11, 0.001), (12, 0.078), (13, 0.038), (14, -0.014), (15, 0.014), (16, -0.025), (17, 0.02), (18, 0.009), (19, 0.034), (20, -0.002), (21, 0.027), (22, -0.002), (23, -0.016), (24, -0.067), (25, -0.039), (26, -0.05), (27, 0.04), (28, -0.034), (29, -0.107), (30, -0.05), (31, 0.001), (32, 0.108), (33, 0.002), (34, 0.083), (35, 0.033), (36, -0.077), (37, 0.001), (38, -0.028), (39, -0.088), (40, -0.006), (41, -0.051), (42, -0.089), (43, 0.042), (44, -0.0), (45, -0.013), (46, 0.016), (47, -0.058), (48, -0.036), (49, 0.044)]
simIndex simValue paperId paperTitle
same-paper 1 0.92776173 94 nips-2013-Distributed $k$-means and $k$-median Clustering on General Topologies
Author: Maria-Florina Balcan, Steven Ehrlich, Yingyu Liang
Abstract: This paper provides new algorithms for distributed clustering for two popular center-based objectives, k-median and k-means. These algorithms have provable guarantees and improve communication complexity over existing approaches. Following a classic approach in clustering by [13], we reduce the problem of finding a clustering with low cost to the problem of finding a coreset of small size. We provide a distributed method for constructing a global coreset which improves over the previous methods by reducing the communication complexity, and which works over general communication topologies. Experimental results on large scale data sets show that this approach outperforms other coreset-based distributed clustering algorithms. 1
Author: Yuchen Zhang, John Duchi, Michael Jordan, Martin J. Wainwright
Abstract: We establish lower bounds on minimax risks for distributed statistical estimation under a communication budget. Such lower bounds reveal the minimum amount of communication required by any procedure to achieve the centralized minimax-optimal rates for statistical estimation. We study two classes of protocols: one in which machines send messages independently, and a second allowing for interactive communication. We establish lower bounds for several problems, including various types of location models, as well as for parameter estimation in regression models. 1
3 0.60790414 238 nips-2013-Optimistic Concurrency Control for Distributed Unsupervised Learning
Author: Xinghao Pan, Joseph E. Gonzalez, Stefanie Jegelka, Tamara Broderick, Michael Jordan
Abstract: Research on distributed machine learning algorithms has focused primarily on one of two extremes—algorithms that obey strict concurrency constraints or algorithms that obey few or no such constraints. We consider an intermediate alternative in which algorithms optimistically assume that conflicts are unlikely and if conflicts do arise a conflict-resolution protocol is invoked. We view this “optimistic concurrency control” paradigm as particularly appropriate for large-scale machine learning algorithms, particularly in the unsupervised setting. We demonstrate our approach in three problem areas: clustering, feature learning and online facility location. We evaluate our methods via large-scale experiments in a cluster computing environment. 1
4 0.5820505 63 nips-2013-Cluster Trees on Manifolds
Author: Sivaraman Balakrishnan, Srivatsan Narayanan, Alessandro Rinaldo, Aarti Singh, Larry Wasserman
Abstract: unkown-abstract
5 0.57499373 197 nips-2013-Moment-based Uniform Deviation Bounds for $k$-means and Friends
Author: Matus Telgarsky, Sanjoy Dasgupta
Abstract: Suppose k centers are fit to m points by heuristically minimizing the k-means cost; what is the corresponding fit over the source distribution? This question is resolved here for distributions with p 4 bounded moments; in particular, the difference between the sample cost and distribution cost decays with m and p as mmin{ 1/4, 1/2+2/p} . The essential technical contribution is a mechanism to uniformly control deviations in the face of unbounded parameter sets, cost functions, and source distributions. To further demonstrate this mechanism, a soft clustering variant of k-means cost is also considered, namely the log likelihood of a Gaussian mixture, subject to the constraint that all covariance matrices have bounded spectrum. Lastly, a rate with refined constants is provided for k-means instances possessing some cluster structure. 1
6 0.52515018 357 nips-2013-k-Prototype Learning for 3D Rigid Structures
7 0.52301359 158 nips-2013-Learning Multiple Models via Regularized Weighting
8 0.52009749 8 nips-2013-A Graphical Transformation for Belief Propagation: Maximum Weight Matchings and Odd-Sized Cycles
9 0.51054692 19 nips-2013-Accelerated Mini-Batch Stochastic Dual Coordinate Ascent
10 0.50366759 112 nips-2013-Estimation Bias in Multi-Armed Bandit Algorithms for Search Advertising
11 0.5022198 355 nips-2013-Which Space Partitioning Tree to Use for Search?
12 0.48287034 107 nips-2013-Embed and Project: Discrete Sampling with Universal Hashing
13 0.47630784 189 nips-2013-Message Passing Inference with Chemical Reaction Networks
14 0.47396082 272 nips-2013-Regularized Spectral Clustering under the Degree-Corrected Stochastic Blockmodel
15 0.46659574 33 nips-2013-An Approximate, Efficient LP Solver for LP Rounding
16 0.45538831 261 nips-2013-Rapid Distance-Based Outlier Detection via Sampling
17 0.44812194 344 nips-2013-Using multiple samples to learn mixture models
18 0.444682 168 nips-2013-Learning to Pass Expectation Propagation Messages
19 0.43605775 151 nips-2013-Learning Chordal Markov Networks by Constraint Satisfaction
20 0.43530655 340 nips-2013-Understanding variable importances in forests of randomized trees
topicId topicWeight
[(9, 0.22), (10, 0.011), (16, 0.063), (33, 0.085), (34, 0.076), (41, 0.048), (49, 0.029), (56, 0.119), (70, 0.032), (85, 0.053), (89, 0.021), (93, 0.1), (95, 0.029)]
simIndex simValue paperId paperTitle
1 0.80735642 13 nips-2013-A Scalable Approach to Probabilistic Latent Space Inference of Large-Scale Networks
Author: Junming Yin, Qirong Ho, Eric Xing
Abstract: We propose a scalable approach for making inference about latent spaces of large networks. With a succinct representation of networks as a bag of triangular motifs, a parsimonious statistical model, and an efficient stochastic variational inference algorithm, we are able to analyze real networks with over a million vertices and hundreds of latent roles on a single machine in a matter of hours, a setting that is out of reach for many existing methods. When compared to the state-of-the-art probabilistic approaches, our method is several orders of magnitude faster, with competitive or improved accuracy for latent space recovery and link prediction. 1
same-paper 2 0.78704429 94 nips-2013-Distributed $k$-means and $k$-median Clustering on General Topologies
Author: Maria-Florina Balcan, Steven Ehrlich, Yingyu Liang
Abstract: This paper provides new algorithms for distributed clustering for two popular center-based objectives, k-median and k-means. These algorithms have provable guarantees and improve communication complexity over existing approaches. Following a classic approach in clustering by [13], we reduce the problem of finding a clustering with low cost to the problem of finding a coreset of small size. We provide a distributed method for constructing a global coreset which improves over the previous methods by reducing the communication complexity, and which works over general communication topologies. Experimental results on large scale data sets show that this approach outperforms other coreset-based distributed clustering algorithms. 1
3 0.70643401 121 nips-2013-Firing rate predictions in optimal balanced networks
Author: David G. Barrett, Sophie Denève, Christian K. Machens
Abstract: How are firing rates in a spiking network related to neural input, connectivity and network function? This is an important problem because firing rates are a key measure of network activity, in both the study of neural computation and neural network dynamics. However, it is a difficult problem, because the spiking mechanism of individual neurons is highly non-linear, and these individual neurons interact strongly through connectivity. We develop a new technique for calculating firing rates in optimal balanced networks. These are particularly interesting networks because they provide an optimal spike-based signal representation while producing cortex-like spiking activity through a dynamic balance of excitation and inhibition. We can calculate firing rates by treating balanced network dynamics as an algorithm for optimising signal representation. We identify this algorithm and then calculate firing rates by finding the solution to the algorithm. Our firing rate calculation relates network firing rates directly to network input, connectivity and function. This allows us to explain the function and underlying mechanism of tuning curves in a variety of systems. 1
4 0.68935251 259 nips-2013-Provable Subspace Clustering: When LRR meets SSC
Author: Yu-Xiang Wang, Huan Xu, Chenlei Leng
Abstract: Sparse Subspace Clustering (SSC) and Low-Rank Representation (LRR) are both considered as the state-of-the-art methods for subspace clustering. The two methods are fundamentally similar in that both are convex optimizations exploiting the intuition of “Self-Expressiveness”. The main difference is that SSC minimizes the vector 1 norm of the representation matrix to induce sparsity while LRR minimizes nuclear norm (aka trace norm) to promote a low-rank structure. Because the representation matrix is often simultaneously sparse and low-rank, we propose a new algorithm, termed Low-Rank Sparse Subspace Clustering (LRSSC), by combining SSC and LRR, and develops theoretical guarantees of when the algorithm succeeds. The results reveal interesting insights into the strength and weakness of SSC and LRR and demonstrate how LRSSC can take the advantages of both methods in preserving the “Self-Expressiveness Property” and “Graph Connectivity” at the same time. 1
5 0.66436356 215 nips-2013-On Decomposing the Proximal Map
Author: Yao-Liang Yu
Abstract: The proximal map is the key step in gradient-type algorithms, which have become prevalent in large-scale high-dimensional problems. For simple functions this proximal map is available in closed-form while for more complicated functions it can become highly nontrivial. Motivated by the need of combining regularizers to simultaneously induce different types of structures, this paper initiates a systematic investigation of when the proximal map of a sum of functions decomposes into the composition of the proximal maps of the individual summands. We not only unify a few known results scattered in the literature but also discover several new decompositions obtained almost effortlessly from our theory. 1
6 0.64295483 99 nips-2013-Dropout Training as Adaptive Regularization
7 0.63369012 69 nips-2013-Context-sensitive active sensing in humans
8 0.63317525 146 nips-2013-Large Scale Distributed Sparse Precision Estimation
9 0.63285333 249 nips-2013-Polar Operators for Structured Sparse Estimation
10 0.63208395 238 nips-2013-Optimistic Concurrency Control for Distributed Unsupervised Learning
11 0.63113791 45 nips-2013-BIG & QUIC: Sparse Inverse Covariance Estimation for a Million Variables
12 0.62940103 252 nips-2013-Predictive PAC Learning and Process Decompositions
13 0.62783945 150 nips-2013-Learning Adaptive Value of Information for Structured Prediction
14 0.62764704 5 nips-2013-A Deep Architecture for Matching Short Texts
15 0.62530875 104 nips-2013-Efficient Online Inference for Bayesian Nonparametric Relational Models
16 0.62521458 278 nips-2013-Reward Mapping for Transfer in Long-Lived Agents
17 0.62451196 19 nips-2013-Accelerated Mini-Batch Stochastic Dual Coordinate Ascent
18 0.62414461 79 nips-2013-DESPOT: Online POMDP Planning with Regularization
19 0.62231195 65 nips-2013-Compressive Feature Learning
20 0.62170798 35 nips-2013-Analyzing the Harmonic Structure in Graph-Based Learning