nips nips2013 nips2013-97 knowledge-graph by maker-knowledge-mining

97 nips-2013-Distributed Submodular Maximization: Identifying Representative Elements in Massive Data


Source: pdf

Author: Baharan Mirzasoleiman, Amin Karbasi, Rik Sarkar, Andreas Krause

Abstract: Many large-scale machine learning problems (such as clustering, non-parametric learning, kernel machines, etc.) require selecting, out of a massive data set, a manageable yet representative subset. Such problems can often be reduced to maximizing a submodular set function subject to cardinality constraints. Classical approaches require centralized access to the full data set; but for truly large-scale problems, rendering the data centrally is often impractical. In this paper, we consider the problem of submodular function maximization in a distributed fashion. We develop a simple, two-stage protocol G REE D I, that is easily implemented using MapReduce style computations. We theoretically analyze our approach, and show, that under certain natural conditions, performance close to the (impractical) centralized approach can be achieved. In our extensive experiments, we demonstrate the effectiveness of our approach on several applications, including sparse Gaussian process inference and exemplar-based clustering, on tens of millions of data points using Hadoop. 1

Reference: text


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 Such problems can often be reduced to maximizing a submodular set function subject to cardinality constraints. [sent-3, score-0.546]

2 Classical approaches require centralized access to the full data set; but for truly large-scale problems, rendering the data centrally is often impractical. [sent-4, score-0.253]

3 In this paper, we consider the problem of submodular function maximization in a distributed fashion. [sent-5, score-0.605]

4 We theoretically analyze our approach, and show, that under certain natural conditions, performance close to the (impractical) centralized approach can be achieved. [sent-7, score-0.232]

5 Applications range from exemplar-based clustering [1], to active set selection for large-scale kernel machines [2], to corpus subset selection for the purpose of training complex prediction models [3]. [sent-10, score-0.246]

6 Many such problems can be reduced to the problem of maximizing a submodular set function subject to cardinality constraints [4, 5]. [sent-11, score-0.546]

7 , maximum weighted matching, max coverage, and finds numerous applications in machine learning and social networks, such as influence maximization [6], information gathering [7], document summarization [3] and active learning [8, 9]. [sent-15, score-0.132]

8 [10] states that a simple greedy algorithm produces solutions competitive with the optimal (intractable) solution. [sent-17, score-0.189]

9 The greedy algorithms that work well for centralized submodular optimization, however, are unfortunately sequential in nature; therefore they are poorly suited for parallel architectures. [sent-21, score-0.882]

10 1 In this paper, we develop a simple, parallel protocol called G REE D I for distributed submodular maximization. [sent-23, score-0.624]

11 We theoretically characterize its performance, and show that under some natural conditions, for large data sets the quality of the obtained solution is competitive with the best centralized solution. [sent-25, score-0.342]

12 Our experimental results demonstrate the effectiveness of our approach on a variety of submodular maximization problems. [sent-26, score-0.506]

13 We show that for problems such as exemplar-based clustering and active set selection, our approach leads to parallel solutions that are very competitive with those obtained via centralized methods (98% in exemplar based clustering and 97% in active set selection). [sent-27, score-0.617]

14 The problem of centralized maximization of submodular functions has received much interest, starting with the seminal work of [10]. [sent-35, score-0.799]

15 The work in [16] considers an algorithm for online distributed submodular maximization with an application to sensor selection. [sent-38, score-0.605]

16 The authors in [4] consider the problem of submodular maximization in a streaming model; however, their approach is not applicable to the general distributed setting. [sent-40, score-0.605]

17 There has also been new improvements in the running time of the greedy solution for solving SET-COVER when the data is large and disk resident [17]. [sent-41, score-0.221]

18 Recently, specific instances of distributed submodular maximization have been studied. [sent-43, score-0.605]

19 [18] address the MAX-COVER problem and provide a (1−1/e− ) approximation to the centralized algorithm, however at the cost of passing over the data set many times. [sent-46, score-0.232]

20 In contrast, we provide a more general framework, and analyze in which settings performance competitive with the centralized setting can be obtained. [sent-52, score-0.258]

21 Further, f is submodular iff for all A ⊆ B ⊆ V and e ∈ V \ B the following diminishing returns condition holds: f (e|A) ≥ 2 f (e|B). [sent-62, score-0.429]

22 (2) Throughout this paper, we focus on such monotone submodular functions. [sent-63, score-0.546]

23 The focus of this paper is on maximizing a monotone submodular function (subject to some constraint) in a distributed manner. [sent-71, score-0.686]

24 Unfortunately, problem (3) is NP-hard, for many classes of submodular functions [12]. [sent-79, score-0.46]

25 [10] shows that a simple greedy algorithm provides a (1 − 1/e) approximation to (3). [sent-81, score-0.163]

26 This greedy algorithm starts with the empty set S0 , and at each iteration i, it chooses an element e ∈ V that maximizes (1), i. [sent-82, score-0.2]

27 For several classes of monotone submodular functions, it is known that (1 − 1/e) is the best approximation guarantee that one can hope for [11, 12, 21]. [sent-86, score-0.567]

28 Moreover, the greedy algorithm can be accelerated using lazy evaluations [22]. [sent-87, score-0.232]

29 , cannot be stored on a single computer), running a standard greedy algorithm or its variants (e. [sent-90, score-0.163]

30 From the algorithmic point of view, however, the above greedy method is in general difficult to parallelize, since at each step, only the object with the highest marginal gain is chosen and every subsequent step depends on the preceding ones. [sent-96, score-0.163]

31 For example, a commonly used kernel function in practice where elements of the ground set V are embedded in a Euclidean space is the squared exponential kernel Kei ,ej = exp(−|ei − ej |2 /h2 ). [sent-120, score-0.17]

32 It can be shown that this choice of f is monotone submodular [21]. [sent-126, score-0.546]

33 For medium-scale problems, the standard greedy algorithms provide good solutions. [sent-127, score-0.163]

34 One approach for finding such exemplars is solving the k-medoid problem [23], which aims to minimize the sum of pairwise dissimilarities between exemplars and elements of the dataset. [sent-130, score-0.295]

35 , = 0) we can turn L into a monotone submodular function: f (S) = L({e0 }) − L(S ∪ {e0 }). [sent-135, score-0.546]

36 Hence, the standard greedy algorithm provides a very good solution. [sent-138, score-0.163]

37 Our distributed solution G REE D I addresses this challenge. [sent-140, score-0.157]

38 2 Naive Approaches Towards Distributed Submodular Maximization One way of implementing the greedy algorithm in parallel would be the following. [sent-142, score-0.221]

39 In each round, all machines – in parallel – compute the marginal gains of all elements in their sets Vi . [sent-144, score-0.221]

40 , tens of thousands or more), rendering this approach impractical for MapReduce style computations. [sent-150, score-0.126]

41 An alternative approach for large k would be to – on each machine – greedily select k/m elements independently (without synchronization), and then merge them to obtain a solution of size k. [sent-151, score-0.268]

42 Unfortunately, many machines may select redundant elements, and the merged solution may suffer from diminishing returns. [sent-155, score-0.202]

43 In Section 4, we introduce an alternative protocol G REE D I, which requires little communication, while at the same time yielding a solution competitive with the centralized one, under certain natural additional assumptions. [sent-156, score-0.354]

44 We first provide our distributed solution G REE D I for maximizing submodular functions under cardinality constraints. [sent-158, score-0.734]

45 As the optimum centralized solution Ac [k] achieves the maximum value of the submodular function, it is clear that f (Ac [k]) ≥ f (Ad [m, k]). [sent-169, score-0.746]

46 In general, however, there is a gap between the distributed and the centralized solution. [sent-171, score-0.331]

47 Let f be a monotone submodular function and let k > 0. [sent-176, score-0.546]

48 In contrast, for any value of m, and k, there is a data partition and a monotone submodular function f such that f (Ac [k]) = min(m, k) · f (Ad [m, k]). [sent-178, score-0.572]

49 2: In each partition Vi find the optimum set 2: Run the standard greedy algorithm on each set Ac [k] of cardinality k. [sent-196, score-0.292]

50 4: Run the standard greedy algorithm on B until Output this solution Ad [m, k]. [sent-202, score-0.221]

51 The above theorem fully characterizes the performance of two-round distributed algorithms in terms of the best centralized solution. [sent-206, score-0.331]

52 It parallels the intractable Algorithm 1, but replaces the selection of optimal subsets by a greedy algorithm. [sent-215, score-0.238]

53 Due to the approximate nature of the greedy algorithm, we allow the algorithms to pick sets slightly larger than k. [sent-216, score-0.189]

54 In particular, G REE D I is a two-round algorithm that takes the ground set V , the number of partitions m, and the cardinality constraints l (final solution) and κ (intermediate outputs). [sent-217, score-0.176]

55 Then each machine separately runs the standard greedy algorithm, namely, it sequentially finds an element e ∈ Vi that maximizes the discrete derivative shown in (1). [sent-219, score-0.2]

56 Each machine i – in parallel – continues adding elements to the set Agc [·] until it reaches κ i elements. [sent-220, score-0.141]

57 Then the solutions are merged: B = ∪m Agc [κ], and another round of greedy selection i=1 i is performed over B, which this time selects l elements. [sent-221, score-0.256]

58 We denote this solution by Agd [m, κ, l]: the greedy solution for parameters m, κ and l. [sent-222, score-0.279]

59 Let f be a monotone submodular function and let l, κ, k > 0. [sent-227, score-0.546]

60 1, it is clear that in general one cannot hope to eliminate the dependency of the distributed solution on min(k, m). [sent-232, score-0.178]

61 The following theorem says that if the α-neighborhoods are sufficiently dense and f is a λ-lipschitz function, then this method can produce a solution close to the centralized solution: Theorem 4. [sent-262, score-0.29]

62 If for each ei ∈ Ac [k], |Nα (ei )| ≥ km log(k/δ 1/m ), and algorithm G REE D I assigns elements uniformly randomly to m processors , then with probability at least (1 − δ), f (Agd [m, κ, l]) ≥ (1 − e−κ/k )(1 − e−l/κ )(f (Ac [k]) − λαk). [sent-264, score-0.152]

63 Let Ac [k] be an optimal solution in the infinite set such that around each ei ∈ Ac [k], there is a neighborhood of radius at least α∗ , where the probability density is at least β at all points, for some constants α∗ and β. [sent-267, score-0.127]

64 This implies that the solution consists of elements coming from reasonably dense and therefore representative regions of the data set. [sent-268, score-0.173]

65 This means, for ei ∈ Ac [k] the probability of a random element being in Nα (ei ) is at least βg(α) and the expected number of α neighbors of ei is at least E[|Nα (ei )|] = nβg(α). [sent-271, score-0.175]

66 In these circumstances, the following theorem guarantees that if the data set V is sufficiently large and f is a λ-lipschitz function, then G REE D I produces a solution close to the centralized solution. [sent-275, score-0.29]

67 More precisely, we call a monotone submodular function f decomposable if it can be written as a sum of (non-negative) 1 monotone submodular functions as follows: f (S) = |V | i∈V fi (S). [sent-284, score-1.21]

68 In other words, there is separate monotone submodular function associated with every data point i ∈ V . [sent-285, score-0.546]

69 Note that the exemplar based clustering application we discussed in Section 3. [sent-287, score-0.136]

70 Then, in the remaining of this section, our goal is to show that assigning each element of the data set randomly to a machine and running G REE D I will provide a solution that is with high probability close to the optimum solution. [sent-290, score-0.122]

71 The above result demonstrates why G REE D I performs well on decomposable submodular functions with massive data even when they are evaluated locally on each machine. [sent-302, score-0.57]

72 5 Experiments In our experimental evaluation we wish to address the following questions: 1) how well does G REE D I perform compared to a centralized solution, 2) how good is the performance of G REE D I when using decomposable objective functions (see Section 4. [sent-304, score-0.316]

73 To this end, we run G REE D I on two scenarios: exemplar based clustering and active set selection in GPs. [sent-306, score-0.221]

74 b) random/greedy: each machine outputs k randomly chosen elements from its local data points, then the standard greedy algorithm is run over mk elements to find a solution of size k. [sent-309, score-0.414]

75 c) greedy/merge: in the first round k/m elements are chosen greedily from each machine and in the second round they are merged to output a solution of size k. [sent-310, score-0.393]

76 d) greedy/max: in the first round each machine greedily finds a solution of size k and in the second round the solution with the maximum value is reported. [sent-311, score-0.278]

77 For data sets where we are able to find the centralized solution, we report the ratio of f (Adist [k])/f (Agc [k]), where Adist [k] is the distributed solution (in particular Agd [m, αk, k] = Adist [k] for G REE D I). [sent-312, score-0.415]

78 Our exemplar based clustering experiment involves G REE D I applied to the clustering utility f (S) (see Sec. [sent-314, score-0.216]

79 1a compares the performance of our approach to the benchmarks with the number of exemplars set to k = 50, and varying number of partitions m. [sent-321, score-0.236]

80 It can be seen that G REE D I significantly outperforms the benchmarks and provides a solution that is very close to the centralized one. [sent-322, score-0.334]

81 Since the exemplar based clustering utility function is decomposable, we repeated the experiment for the more realistic case where the function evaluation in each machine was restricted to the local elements of the dataset in that particular machine (rather than the entire dataset). [sent-324, score-0.244]

82 Each reducer separately performed the lazy greedy algorithm on its own set of 10,000 images (≈123MB) to extract 64 images with the highest marginal gains w. [sent-331, score-0.402]

83 We then merged the results and performed another round of lazy greedy selection on the merged results to extract the final 64 exemplars. [sent-335, score-0.536]

84 1c shows, G REE D I highly outperforms the other distributed benchmarks and can scale well to very large datasets. [sent-342, score-0.166]

85 1d shows a set of cluster exemplars discovered by G REE D I where each column in Fig. [sent-344, score-0.128]

86 1h shows 8 nearest images to exemplars 9 and 16 (shown with red borders) in Fig. [sent-345, score-0.16]

87 centralized solution for global and local objective functions with budget k = 50 and varying the number m of partitions, for a set of 10,000 Tiny Images. [sent-385, score-0.35]

88 c) shows the distributed solution with m = 8000 and varying k for local objective functions on the whole dataset of 80,000,000 Tiny Images. [sent-386, score-0.217]

89 centralized solution with m = 10 and varying k for Parkinsons Telemonitoring. [sent-388, score-0.319]

90 f) shows the same ratio with k = 50 and varying m on the same dataset, and g) shows the distributed solution for m = 32 with varying budget k on Yahoo! [sent-389, score-0.215]

91 d) shows a set of cluster exemplars discovered by G REE D I, and each column in h) shows 8 images nearest to exemplars 9 and 16 in d). [sent-391, score-0.288]

92 1f compares the performance G REE D I to the benchmarks with fixed k = 50 and varying number of partitions m. [sent-395, score-0.13]

93 Each reducer performed the lazy greedy algorithm on its own set of 1,431,621 vectors (≈34MB) in order to extract 128 elements with the highest marginal gains w. [sent-404, score-0.377]

94 We then merged the results and performed another round of lazy greedy selection on the merged results to extract the final active set of size 128. [sent-407, score-0.591]

95 We note again that G REE D I significantly outperforms the other distributed benchmarks and can scale well to very large datasets. [sent-412, score-0.166]

96 6 Conclusion We have developed an efficient distributed protocol G REE D I, for maximizing a submodular function subject to cardinality constraints. [sent-413, score-0.683]

97 We have theoretically analyzed the performance of our method and showed under certain natural conditions it performs very close to the centralized (albeit impractical in massive data sets) greedy solution. [sent-414, score-0.477]

98 We believe our results provide an important step towards solving submodular optimization problems in very large scale, real applications. [sent-416, score-0.429]

99 An analysis of approximations for maximizing submodular set functions - I. [sent-460, score-0.501]

100 Best algorithms for approximating the maximum of a submodular set function. [sent-467, score-0.429]


similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('ree', 0.619), ('submodular', 0.429), ('centralized', 0.232), ('ac', 0.17), ('greedy', 0.163), ('mapreduce', 0.145), ('agd', 0.137), ('monotone', 0.117), ('exemplars', 0.106), ('distributed', 0.099), ('greedi', 0.092), ('merge', 0.091), ('merged', 0.09), ('agc', 0.086), ('elements', 0.083), ('exemplar', 0.081), ('maximization', 0.077), ('cardinality', 0.076), ('tiny', 0.072), ('lazy', 0.069), ('ei', 0.069), ('round', 0.063), ('ad', 0.062), ('telemonitoring', 0.061), ('parallel', 0.058), ('solution', 0.058), ('massive', 0.057), ('partitions', 0.057), ('parkinsons', 0.056), ('active', 0.055), ('clustering', 0.055), ('images', 0.054), ('machines', 0.054), ('decomposable', 0.053), ('adist', 0.052), ('style', 0.051), ('vm', 0.051), ('yahoo', 0.049), ('nemhauser', 0.048), ('andreas', 0.048), ('eth', 0.045), ('benchmarks', 0.044), ('ek', 0.043), ('ground', 0.043), ('hadoop', 0.042), ('maximizing', 0.041), ('submodularity', 0.041), ('xv', 0.041), ('vi', 0.04), ('protocol', 0.038), ('zurich', 0.038), ('element', 0.037), ('greedily', 0.036), ('krause', 0.036), ('blelloch', 0.035), ('kei', 0.035), ('lattanzi', 0.035), ('reducers', 0.035), ('spaa', 0.035), ('communication', 0.034), ('fi', 0.034), ('representative', 0.032), ('functions', 0.031), ('xs', 0.031), ('front', 0.031), ('extract', 0.031), ('chierichetti', 0.031), ('hereby', 0.031), ('reducer', 0.031), ('selection', 0.03), ('seminal', 0.03), ('varying', 0.029), ('tens', 0.029), ('today', 0.028), ('parkinson', 0.028), ('optimum', 0.027), ('mk', 0.027), ('golovin', 0.026), ('distributes', 0.026), ('competitive', 0.026), ('partition', 0.026), ('sets', 0.026), ('growth', 0.025), ('utility', 0.025), ('impractical', 0.025), ('fig', 0.024), ('parallels', 0.024), ('user', 0.024), ('module', 0.023), ('scale', 0.023), ('cluster', 0.022), ('kernel', 0.022), ('ya', 0.022), ('manageable', 0.022), ('concrete', 0.022), ('selecting', 0.022), ('nds', 0.021), ('hope', 0.021), ('intractable', 0.021), ('rendering', 0.021)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 1.0000004 97 nips-2013-Distributed Submodular Maximization: Identifying Representative Elements in Massive Data

Author: Baharan Mirzasoleiman, Amin Karbasi, Rik Sarkar, Andreas Krause

Abstract: Many large-scale machine learning problems (such as clustering, non-parametric learning, kernel machines, etc.) require selecting, out of a massive data set, a manageable yet representative subset. Such problems can often be reduced to maximizing a submodular set function subject to cardinality constraints. Classical approaches require centralized access to the full data set; but for truly large-scale problems, rendering the data centrally is often impractical. In this paper, we consider the problem of submodular function maximization in a distributed fashion. We develop a simple, two-stage protocol G REE D I, that is easily implemented using MapReduce style computations. We theoretically analyze our approach, and show, that under certain natural conditions, performance close to the (impractical) centralized approach can be achieved. In our extensive experiments, we demonstrate the effectiveness of our approach on several applications, including sparse Gaussian process inference and exemplar-based clustering, on tens of millions of data points using Hadoop. 1

2 0.3716996 78 nips-2013-Curvature and Optimal Algorithms for Learning and Minimizing Submodular Functions

Author: Rishabh K. Iyer, Stefanie Jegelka, Jeff A. Bilmes

Abstract: We investigate three related and important problems connected to machine learning: approximating a submodular function everywhere, learning a submodular function (in a PAC-like setting [28]), and constrained minimization of submodular functions. We show that the complexity of all three problems depends on the “curvature” of the submodular function, and provide lower and upper bounds that refine and improve previous results [2, 6, 8, 27]. Our proof techniques are fairly generic. We either use a black-box transformation of the function (for approximation and learning), or a transformation of algorithms to use an appropriate surrogate function (for minimization). Curiously, curvature has been known to influence approximations for submodular maximization [3, 29], but its effect on minimization, approximation and learning has hitherto been open. We complete this picture, and also support our theoretical claims by empirical results. 1

3 0.35006893 319 nips-2013-Submodular Optimization with Submodular Cover and Submodular Knapsack Constraints

Author: Rishabh K. Iyer, Jeff A. Bilmes

Abstract: We investigate two new optimization problems — minimizing a submodular function subject to a submodular lower bound constraint (submodular cover) and maximizing a submodular function subject to a submodular upper bound constraint (submodular knapsack). We are motivated by a number of real-world applications in machine learning including sensor placement and data subset selection, which require maximizing a certain submodular function (like coverage or diversity) while simultaneously minimizing another (like cooperative cost). These problems are often posed as minimizing the difference between submodular functions [9, 25] which is in the worst case inapproximable. We show, however, that by phrasing these problems as constrained optimization, which is more natural for many applications, we achieve a number of bounded approximation guarantees. We also show that both these problems are closely related and an approximation algorithm solving one can be used to obtain an approximation guarantee for the other. We provide hardness results for both problems thus showing that our approximation factors are tight up to log-factors. Finally, we empirically demonstrate the performance and good scalability properties of our algorithms. 1

4 0.24048412 268 nips-2013-Reflection methods for user-friendly submodular optimization

Author: Stefanie Jegelka, Francis Bach, Suvrit Sra

Abstract: Recently, it has become evident that submodularity naturally captures widely occurring concepts in machine learning, signal processing and computer vision. Consequently, there is need for efficient optimization procedures for submodular functions, especially for minimization problems. While general submodular minimization is challenging, we propose a new method that exploits existing decomposability of submodular functions. In contrast to previous approaches, our method is neither approximate, nor impractical, nor does it need any cumbersome parameter tuning. Moreover, it is easy to implement and parallelize. A key component of our method is a formulation of the discrete submodular minimization problem as a continuous best approximation problem that is solved through a sequence of reflections, and its solution can be easily thresholded to obtain an optimal discrete solution. This method solves both the continuous and discrete formulations of the problem, and therefore has applications in learning, inference, and reconstruction. In our experiments, we illustrate the benefits of our method on two image segmentation tasks. 1

5 0.1905095 29 nips-2013-Adaptive Submodular Maximization in Bandit Setting

Author: Victor Gabillon, Branislav Kveton, Zheng Wen, Brian Eriksson, S. Muthukrishnan

Abstract: Maximization of submodular functions has wide applications in machine learning and artificial intelligence. Adaptive submodular maximization has been traditionally studied under the assumption that the model of the world, the expected gain of choosing an item given previously selected items and their states, is known. In this paper, we study the setting where the expected gain is initially unknown, and it is learned by interacting repeatedly with the optimized function. We propose an efficient algorithm for solving our problem and prove that its expected cumulative regret increases logarithmically with time. Our regret bound captures the inherent property of submodular maximization, earlier mistakes are more costly than later ones. We refer to our approach as Optimistic Adaptive Submodular Maximization (OASM) because it trades off exploration and exploitation based on the optimism in the face of uncertainty principle. We evaluate our method on a preference elicitation problem and show that non-trivial K-step policies can be learned from just a few hundred interactions with the problem. 1

6 0.10554302 142 nips-2013-Information-theoretic lower bounds for distributed statistical estimation with communication constraints

7 0.10219528 359 nips-2013-Σ-Optimality for Active Learning on Gaussian Random Fields

8 0.09201695 19 nips-2013-Accelerated Mini-Batch Stochastic Dual Coordinate Ascent

9 0.07400392 23 nips-2013-Active Learning for Probabilistic Hypotheses Using the Maximum Gibbs Error Criterion

10 0.072081923 207 nips-2013-Near-optimal Anomaly Detection in Graphs using Lovasz Extended Scan Statistic

11 0.070393629 291 nips-2013-Sensor Selection in High-Dimensional Gaussian Trees with Nuisances

12 0.065035962 190 nips-2013-Mid-level Visual Element Discovery as Discriminative Mode Seeking

13 0.060450561 45 nips-2013-BIG & QUIC: Sparse Inverse Covariance Estimation for a Million Variables

14 0.056884401 149 nips-2013-Latent Structured Active Learning

15 0.056217127 94 nips-2013-Distributed $k$-means and $k$-median Clustering on General Topologies

16 0.055477973 288 nips-2013-Scalable Influence Estimation in Continuous-Time Diffusion Networks

17 0.048144709 201 nips-2013-Multi-Task Bayesian Optimization

18 0.047439415 202 nips-2013-Multiclass Total Variation Clustering

19 0.046168003 150 nips-2013-Learning Adaptive Value of Information for Structured Prediction

20 0.046115864 309 nips-2013-Statistical Active Learning Algorithms


similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.153), (1, 0.012), (2, 0.017), (3, 0.014), (4, 0.094), (5, 0.147), (6, -0.24), (7, -0.399), (8, 0.112), (9, -0.158), (10, -0.126), (11, -0.017), (12, -0.108), (13, 0.069), (14, -0.025), (15, 0.041), (16, 0.047), (17, -0.023), (18, 0.018), (19, -0.044), (20, 0.015), (21, 0.021), (22, -0.044), (23, -0.022), (24, -0.008), (25, 0.016), (26, -0.076), (27, -0.027), (28, -0.028), (29, -0.073), (30, 0.086), (31, -0.017), (32, 0.047), (33, 0.018), (34, 0.043), (35, 0.026), (36, 0.0), (37, -0.001), (38, -0.027), (39, -0.015), (40, -0.031), (41, -0.048), (42, 0.018), (43, 0.036), (44, 0.004), (45, -0.026), (46, 0.029), (47, -0.002), (48, 0.031), (49, -0.031)]

similar papers list:

simIndex simValue paperId paperTitle

1 0.94597739 319 nips-2013-Submodular Optimization with Submodular Cover and Submodular Knapsack Constraints

Author: Rishabh K. Iyer, Jeff A. Bilmes

Abstract: We investigate two new optimization problems — minimizing a submodular function subject to a submodular lower bound constraint (submodular cover) and maximizing a submodular function subject to a submodular upper bound constraint (submodular knapsack). We are motivated by a number of real-world applications in machine learning including sensor placement and data subset selection, which require maximizing a certain submodular function (like coverage or diversity) while simultaneously minimizing another (like cooperative cost). These problems are often posed as minimizing the difference between submodular functions [9, 25] which is in the worst case inapproximable. We show, however, that by phrasing these problems as constrained optimization, which is more natural for many applications, we achieve a number of bounded approximation guarantees. We also show that both these problems are closely related and an approximation algorithm solving one can be used to obtain an approximation guarantee for the other. We provide hardness results for both problems thus showing that our approximation factors are tight up to log-factors. Finally, we empirically demonstrate the performance and good scalability properties of our algorithms. 1

2 0.93344587 78 nips-2013-Curvature and Optimal Algorithms for Learning and Minimizing Submodular Functions

Author: Rishabh K. Iyer, Stefanie Jegelka, Jeff A. Bilmes

Abstract: We investigate three related and important problems connected to machine learning: approximating a submodular function everywhere, learning a submodular function (in a PAC-like setting [28]), and constrained minimization of submodular functions. We show that the complexity of all three problems depends on the “curvature” of the submodular function, and provide lower and upper bounds that refine and improve previous results [2, 6, 8, 27]. Our proof techniques are fairly generic. We either use a black-box transformation of the function (for approximation and learning), or a transformation of algorithms to use an appropriate surrogate function (for minimization). Curiously, curvature has been known to influence approximations for submodular maximization [3, 29], but its effect on minimization, approximation and learning has hitherto been open. We complete this picture, and also support our theoretical claims by empirical results. 1

same-paper 3 0.92580754 97 nips-2013-Distributed Submodular Maximization: Identifying Representative Elements in Massive Data

Author: Baharan Mirzasoleiman, Amin Karbasi, Rik Sarkar, Andreas Krause

Abstract: Many large-scale machine learning problems (such as clustering, non-parametric learning, kernel machines, etc.) require selecting, out of a massive data set, a manageable yet representative subset. Such problems can often be reduced to maximizing a submodular set function subject to cardinality constraints. Classical approaches require centralized access to the full data set; but for truly large-scale problems, rendering the data centrally is often impractical. In this paper, we consider the problem of submodular function maximization in a distributed fashion. We develop a simple, two-stage protocol G REE D I, that is easily implemented using MapReduce style computations. We theoretically analyze our approach, and show, that under certain natural conditions, performance close to the (impractical) centralized approach can be achieved. In our extensive experiments, we demonstrate the effectiveness of our approach on several applications, including sparse Gaussian process inference and exemplar-based clustering, on tens of millions of data points using Hadoop. 1

4 0.78073406 268 nips-2013-Reflection methods for user-friendly submodular optimization

Author: Stefanie Jegelka, Francis Bach, Suvrit Sra

Abstract: Recently, it has become evident that submodularity naturally captures widely occurring concepts in machine learning, signal processing and computer vision. Consequently, there is need for efficient optimization procedures for submodular functions, especially for minimization problems. While general submodular minimization is challenging, we propose a new method that exploits existing decomposability of submodular functions. In contrast to previous approaches, our method is neither approximate, nor impractical, nor does it need any cumbersome parameter tuning. Moreover, it is easy to implement and parallelize. A key component of our method is a formulation of the discrete submodular minimization problem as a continuous best approximation problem that is solved through a sequence of reflections, and its solution can be easily thresholded to obtain an optimal discrete solution. This method solves both the continuous and discrete formulations of the problem, and therefore has applications in learning, inference, and reconstruction. In our experiments, we illustrate the benefits of our method on two image segmentation tasks. 1

5 0.55303544 29 nips-2013-Adaptive Submodular Maximization in Bandit Setting

Author: Victor Gabillon, Branislav Kveton, Zheng Wen, Brian Eriksson, S. Muthukrishnan

Abstract: Maximization of submodular functions has wide applications in machine learning and artificial intelligence. Adaptive submodular maximization has been traditionally studied under the assumption that the model of the world, the expected gain of choosing an item given previously selected items and their states, is known. In this paper, we study the setting where the expected gain is initially unknown, and it is learned by interacting repeatedly with the optimized function. We propose an efficient algorithm for solving our problem and prove that its expected cumulative regret increases logarithmically with time. Our regret bound captures the inherent property of submodular maximization, earlier mistakes are more costly than later ones. We refer to our approach as Optimistic Adaptive Submodular Maximization (OASM) because it trades off exploration and exploitation based on the optimism in the face of uncertainty principle. We evaluate our method on a preference elicitation problem and show that non-trivial K-step policies can be learned from just a few hundred interactions with the problem. 1

6 0.38994092 207 nips-2013-Near-optimal Anomaly Detection in Graphs using Lovasz Extended Scan Statistic

7 0.3555465 359 nips-2013-Σ-Optimality for Active Learning on Gaussian Random Fields

8 0.32336178 94 nips-2013-Distributed $k$-means and $k$-median Clustering on General Topologies

9 0.31150115 238 nips-2013-Optimistic Concurrency Control for Distributed Unsupervised Learning

10 0.28839397 132 nips-2013-Global MAP-Optimality by Shrinking the Combinatorial Search Area with Convex Relaxation

11 0.28097236 19 nips-2013-Accelerated Mini-Batch Stochastic Dual Coordinate Ascent

12 0.27433386 112 nips-2013-Estimation Bias in Multi-Armed Bandit Algorithms for Search Advertising

13 0.27348042 142 nips-2013-Information-theoretic lower bounds for distributed statistical estimation with communication constraints

14 0.2676079 291 nips-2013-Sensor Selection in High-Dimensional Gaussian Trees with Nuisances

15 0.26194584 33 nips-2013-An Approximate, Efficient LP Solver for LP Rounding

16 0.25898212 80 nips-2013-Data-driven Distributionally Robust Polynomial Optimization

17 0.25611836 333 nips-2013-Trading Computation for Communication: Distributed Stochastic Dual Coordinate Ascent

18 0.25486088 190 nips-2013-Mid-level Visual Element Discovery as Discriminative Mode Seeking

19 0.2542221 328 nips-2013-The Total Variation on Hypergraphs - Learning on Hypergraphs Revisited

20 0.25309902 115 nips-2013-Factorized Asymptotic Bayesian Inference for Latent Feature Models


similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(2, 0.014), (16, 0.062), (33, 0.13), (34, 0.112), (41, 0.026), (49, 0.02), (56, 0.09), (62, 0.2), (70, 0.064), (76, 0.012), (85, 0.032), (89, 0.028), (93, 0.056), (95, 0.068)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.80347347 97 nips-2013-Distributed Submodular Maximization: Identifying Representative Elements in Massive Data

Author: Baharan Mirzasoleiman, Amin Karbasi, Rik Sarkar, Andreas Krause

Abstract: Many large-scale machine learning problems (such as clustering, non-parametric learning, kernel machines, etc.) require selecting, out of a massive data set, a manageable yet representative subset. Such problems can often be reduced to maximizing a submodular set function subject to cardinality constraints. Classical approaches require centralized access to the full data set; but for truly large-scale problems, rendering the data centrally is often impractical. In this paper, we consider the problem of submodular function maximization in a distributed fashion. We develop a simple, two-stage protocol G REE D I, that is easily implemented using MapReduce style computations. We theoretically analyze our approach, and show, that under certain natural conditions, performance close to the (impractical) centralized approach can be achieved. In our extensive experiments, we demonstrate the effectiveness of our approach on several applications, including sparse Gaussian process inference and exemplar-based clustering, on tens of millions of data points using Hadoop. 1

2 0.71457541 268 nips-2013-Reflection methods for user-friendly submodular optimization

Author: Stefanie Jegelka, Francis Bach, Suvrit Sra

Abstract: Recently, it has become evident that submodularity naturally captures widely occurring concepts in machine learning, signal processing and computer vision. Consequently, there is need for efficient optimization procedures for submodular functions, especially for minimization problems. While general submodular minimization is challenging, we propose a new method that exploits existing decomposability of submodular functions. In contrast to previous approaches, our method is neither approximate, nor impractical, nor does it need any cumbersome parameter tuning. Moreover, it is easy to implement and parallelize. A key component of our method is a formulation of the discrete submodular minimization problem as a continuous best approximation problem that is solved through a sequence of reflections, and its solution can be easily thresholded to obtain an optimal discrete solution. This method solves both the continuous and discrete formulations of the problem, and therefore has applications in learning, inference, and reconstruction. In our experiments, we illustrate the benefits of our method on two image segmentation tasks. 1

3 0.71103895 201 nips-2013-Multi-Task Bayesian Optimization

Author: Kevin Swersky, Jasper Snoek, Ryan P. Adams

Abstract: Bayesian optimization has recently been proposed as a framework for automatically tuning the hyperparameters of machine learning models and has been shown to yield state-of-the-art performance with impressive ease and efficiency. In this paper, we explore whether it is possible to transfer the knowledge gained from previous optimizations to new tasks in order to find optimal hyperparameter settings more efficiently. Our approach is based on extending multi-task Gaussian processes to the framework of Bayesian optimization. We show that this method significantly speeds up the optimization process when compared to the standard single-task approach. We further propose a straightforward extension of our algorithm in order to jointly minimize the average error across multiple tasks and demonstrate how this can be used to greatly speed up k-fold cross-validation. Lastly, we propose an adaptation of a recently developed acquisition function, entropy search, to the cost-sensitive, multi-task setting. We demonstrate the utility of this new acquisition function by leveraging a small dataset to explore hyperparameter settings for a large dataset. Our algorithm dynamically chooses which dataset to query in order to yield the most information per unit cost. 1

4 0.70931476 45 nips-2013-BIG & QUIC: Sparse Inverse Covariance Estimation for a Million Variables

Author: Cho-Jui Hsieh, Matyas A. Sustik, Inderjit Dhillon, Pradeep Ravikumar, Russell Poldrack

Abstract: The 1 -regularized Gaussian maximum likelihood estimator (MLE) has been shown to have strong statistical guarantees in recovering a sparse inverse covariance matrix even under high-dimensional settings. However, it requires solving a difficult non-smooth log-determinant program with number of parameters scaling quadratically with the number of Gaussian variables. State-of-the-art methods thus do not scale to problems with more than 20, 000 variables. In this paper, we develop an algorithm B IG QUIC, which can solve 1 million dimensional 1 regularized Gaussian MLE problems (which would thus have 1000 billion parameters) using a single machine, with bounded memory. In order to do so, we carefully exploit the underlying structure of the problem. Our innovations include a novel block-coordinate descent method with the blocks chosen via a clustering scheme to minimize repeated computations; and allowing for inexact computation of specific components. In spite of these modifications, we are able to theoretically analyze our procedure and show that B IG QUIC can achieve super-linear or even quadratic convergence rates. 1

5 0.70737559 56 nips-2013-Better Approximation and Faster Algorithm Using the Proximal Average

Author: Yao-Liang Yu

Abstract: It is a common practice to approximate “complicated” functions with more friendly ones. In large-scale machine learning applications, nonsmooth losses/regularizers that entail great computational challenges are usually approximated by smooth functions. We re-examine this powerful methodology and point out a nonsmooth approximation which simply pretends the linearity of the proximal map. The new approximation is justified using a recent convex analysis tool— proximal average, and yields a novel proximal gradient algorithm that is strictly better than the one based on smoothing, without incurring any extra overhead. Numerical experiments conducted on two important applications, overlapping group lasso and graph-guided fused lasso, corroborate the theoretical claims. 1

6 0.70530653 350 nips-2013-Wavelets on Graphs via Deep Learning

7 0.70501506 333 nips-2013-Trading Computation for Communication: Distributed Stochastic Dual Coordinate Ascent

8 0.70450413 238 nips-2013-Optimistic Concurrency Control for Distributed Unsupervised Learning

9 0.703906 5 nips-2013-A Deep Architecture for Matching Short Texts

10 0.70238787 154 nips-2013-Learning Gaussian Graphical Models with Observed or Latent FVSs

11 0.69850963 49 nips-2013-Bayesian Inference and Online Experimental Design for Mapping Neural Microcircuits

12 0.6980539 114 nips-2013-Extracting regions of interest from biological images with convolutional sparse block coding

13 0.6972506 22 nips-2013-Action is in the Eye of the Beholder: Eye-gaze Driven Model for Spatio-Temporal Action Localization

14 0.69683021 173 nips-2013-Least Informative Dimensions

15 0.69677699 186 nips-2013-Matrix factorization with binary components

16 0.69602269 102 nips-2013-Efficient Algorithm for Privately Releasing Smooth Queries

17 0.69573182 121 nips-2013-Firing rate predictions in optimal balanced networks

18 0.69572151 149 nips-2013-Latent Structured Active Learning

19 0.69571668 150 nips-2013-Learning Adaptive Value of Information for Structured Prediction

20 0.69567358 104 nips-2013-Efficient Online Inference for Bayesian Nonparametric Relational Models