nips nips2012 nips2012-6 knowledge-graph by maker-knowledge-mining

6 nips-2012-A Convex Formulation for Learning Scale-Free Networks via Submodular Relaxation


Source: pdf

Author: Aaron Defazio, Tibério S. Caetano

Abstract: A key problem in statistics and machine learning is the determination of network structure from data. We consider the case where the structure of the graph to be reconstructed is known to be scale-free. We show that in such cases it is natural to formulate structured sparsity inducing priors using submodular functions, and we use their Lov´ sz extension to obtain a convex relaxation. For tractable classes a such as Gaussian graphical models, this leads to a convex optimization problem that can be efficiently solved. We show that our method results in an improvement in the accuracy of reconstructed networks for synthetic data. We also show how our prior encourages scale-free reconstructions on a bioinfomatics dataset.

Reference: text


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 We show that in such cases it is natural to formulate structured sparsity inducing priors using submodular functions, and we use their Lov´ sz extension to obtain a convex relaxation. [sent-11, score-0.759]

2 For tractable classes a such as Gaussian graphical models, this leads to a convex optimization problem that can be efficiently solved. [sent-12, score-0.196]

3 For example, in bioinfomatics Gaussian graphical models are fitted to data resulting from micro-array experiments, where the fitted graph can be interpreted as a gene expression network [9]. [sent-17, score-0.34]

4 There is a significant body of literature, more than 30 papers by our count, on various methods of optimizing the L1 regularized covariance selection objective alone (see the recent review by Scheinberg and Ma [17]). [sent-20, score-0.177]

5 Recent research has seen the development of structured sparsity, where more complex prior knowledge about a sparsity pattern can be encoded. [sent-21, score-0.159]

6 We show that scale-free networks can be induced by enforcing submodular priors on the network’s degree distribution, and then using their convex envelope (the Lov´ sz extension) as a convex relaxation [2]. [sent-28, score-1.12]

7 We outline a few options for solving the optimisation problem via proximal operators [3], in particular an efficient dual decomposition method. [sent-30, score-0.757]

8 Experiments on both synthetic data produced by scale-free network models and a real bioinformatics dataset suggest that the convex relaxation is not weak: we can infer scale-free networks with similar or superior accuracy than in [14]. [sent-31, score-0.292]

9 1 1 Combinatorial Objective Consider an undirected graph with edge set E and node set V , where n is the number of nodes. [sent-32, score-0.277]

10 We denote the degree of node v as dE (v), and the complete graph with n nodes as Kn . [sent-33, score-0.339]

11 We are concerned with placing priors on the degree distributions of graphs such as (V, E). [sent-34, score-0.264]

12 By degree distribution, we mean the bag of degrees {dE (v)|v ∈ V }. [sent-35, score-0.159]

13 A natural prior on degree distributions can be formed from the family of exponential random graphs [21]. [sent-36, score-0.361]

14 Exponential random graph (ERG) models assign a probability to each n node graph using an exponential family model. [sent-37, score-0.312]

15 The probability of each graph depends on a small set of sufficient statistics, in our case we only consider the degree statistics. [sent-38, score-0.257]

16 A ERG distribution with degree parametrization takes the form: p(G = (V, E); h) ≈ 1 exp − Z(h) h(dE (v)) , (1) v∈V The degree weighting function h : Z+ → R encodes the preference for each particular degree. [sent-39, score-0.318]

17 It is instructive to observe that, under the strong assumption that each node’s degree is independent of the rest, h grows logarithmically. [sent-44, score-0.159]

18 We now consider the properties of h that lead to a convex relaxation of F . [sent-50, score-0.188]

19 2 Submodularity A set function F : 2E → R on E is a non-decreasing submodular function if for all A ⊂ B ⊂ E and x ∈ E\B the following conditions hold: F (A ∪ {x}) − F (A) ≥ F (B ∪ {x}) − F (B) and F (A) ≤ F (B). [sent-51, score-0.419]

20 For tractable h, F is a non-decreasing submodular function. [sent-56, score-0.47]

21 First note that the degree function is a set cardinality function, and hence modular. [sent-58, score-0.159]

22 A concave transformation of a modular function is submodular [1], and the sum of submodular functions is submodular. [sent-59, score-0.838]

23 As far as we are aware, this is a novel way of mathematically modelling the ‘preferential attachment’ rule [4] that gives rise to scale-free networks: through non-decreasing submodular functions on the degree distribution. [sent-61, score-0.578]

24 A natural convex relaxation of F would be the convex envelope of F (Supp(X)) under some restricted domain. [sent-63, score-0.355]

25 For tractable h, we have by construction that F satisfies the conditions of Proposition 1 in [2], so that the convex envelope of F (Supp(X)) on the L∞ ball is precisely the Lov´ sz extension evaluated on |X|. [sent-64, score-0.301]

26 Let Xi,(j) be the weight of the jth edge connected to i, under a decreasing ordering by absolute value (i. [sent-68, score-0.204]

27 Then the convex envelope of F for tractable h over the L∞ norm unit ball is: n n−1 (h(k + 1) − h(k)) |Xi,(k) |. [sent-74, score-0.218]

28 It behaves like a L1 norm with an additional weight on each edge that depends on how the edge ranks with respect to the other edges of its neighbouring nodes. [sent-77, score-0.212]

29 3 Optimization We are interested in using Ω as a prior, for optimizations of the form minimizeX f (X) = g(X) + αΩ(X), for convex functions g and prior strength parameters α ∈ R+ , over symmetric X. [sent-78, score-0.184]

30 The support of X will then be the set of edges in the undirected graphical model together with the node precisions. [sent-81, score-0.221]

31 Bach [2] suggests two approaches to optimizing functions involving submodular relaxation priors; a subgradient approach and a proximal approach. [sent-88, score-1.097]

32 For our objective, a subgradient is simple to evaluate at any point, due to the piecewise continuous nature of Ω(X). [sent-91, score-0.161]

33 Unfortunately (primal) subgradient methods for our problem will not return sparse solutions except in the limit of convergence. [sent-92, score-0.209]

34 An alternative is the use of proximal methods [2]. [sent-94, score-0.383]

35 Proximal methods exhibit superior convergence in comparison to subgradient methods, and produce sparse solutions. [sent-95, score-0.209]

36 Proximal methods rely on solving a simpler optimization problem, known as the proximal operator at each iteration: 1 2 arg min αΩ(X) + X −Z 2 , 2 X 3 where Z is a variable that varies at each iteration. [sent-96, score-0.495]

37 For many problems of interest, the proximal operator can be evaluated using a closed form solution. [sent-97, score-0.495]

38 For non-decreasing submodular relaxations, the proximal operator can be evaluated by solving a submodular minimization on a related (not necessarily non-decreasing) submodular function [2]. [sent-98, score-1.752]

39 Bach [2] considers several example problems where the proximal operator can be evaluated using fast graph cut methods. [sent-99, score-0.593]

40 Generic submodular minimization algorithms could be as slow as O(n12 ) for a n-vertex graph, which is clearly impractical [11]. [sent-101, score-0.419]

41 We will instead propose a dual decomposition method for solving this proximal operator problem in Section 3. [sent-102, score-0.809]

42 1 Alternating direction method of multipliers The alternating direction method of multipliers (ADMM, Boyd et al. [sent-107, score-0.167]

43 [6]) is one approach to optimizing our objective that has a number of advantages over the basic proximal method. [sent-108, score-0.419]

44 Let U be the matrix of dual variables for the decoupled problem: minimizeX g(X) + αΩ(Y ), s. [sent-109, score-0.181]

45 The advantage of this form is that both the X and Y updates are a proximal operation. [sent-115, score-0.383]

46 It turns out that the proximal operator for g (i. [sent-116, score-0.495]

47 the X (l+1) update) actually has a simple solution [18] that can be computed by taking an eigenvalue decomposition QT ΛQ = ρ(Y − U ) − C, where Λ = diag(λ1 , . [sent-118, score-0.171]

48 For our degree prior regularizer, the difficultly is in computing the proximal operator for Ω, as the rest of the algorithm is identical to that presented in Boyd et al. [sent-126, score-0.718]

49 We now show how we solve the problem of computing the proximal operator for Ω. [sent-128, score-0.495]

50 2 Proximal operator using dual decomposition Here we describe the optimisation algorithm that we effectively use for computing the proximal operator. [sent-130, score-0.869]

51 We can decouple these terms using the dual decomposition technique, by writing the proximal operation for a given Z = Y − U as: minimizeX = α ρ n n−1 i k 1 (h(k + 1) − h(k)) Xi,(k) + ||X − Z||2 2 2 s. [sent-132, score-0.697]

52 Taking the dual gives a formulation where the upper and lower triangle are treated as separate variables. [sent-136, score-0.181]

53 The dual variable matrix V corresponds to the Lagrange multipliers of the symmetry constraint, which for notational convenience we store in an anti-symmetric matrix. [sent-137, score-0.238]

54 The dual decomposition method is given in Algorithm 1. [sent-138, score-0.314]

55 Since this is a dual method, the primal variables X are not feasible (i. [sent-140, score-0.181]

56 Essentially we have decomposed the original problem, so that now we only need to solve the proximal operation for each node in isolation, namely the subproblems: ∀i. [sent-143, score-0.465]

57 2 (3) k Note that the dual variable has been integrated into the quadratic term by completing the square. [sent-145, score-0.181]

58 Each subproblem is strongly convex as they consist of convex terms plus a positive quadratic term. [sent-147, score-0.256]

59 This implies that the dual problem is differentiable (as the subdifferential contains only one subgradient), hence the V update is actually gradient ascent. [sent-148, score-0.215]

60 Since a fixed step size is used, and the dual is Lipschitz continuous, for sufficiently small step-size convergence is guaranteed. [sent-149, score-0.181]

61 This dual decomposition subproblem can also be interpreted as just a step within the ADMM framework. [sent-152, score-0.39]

62 If applied in a standard way, only one dual variable update would be performed before another expensive eigenvalue decomposition step. [sent-153, score-0.352]

63 Since each iteration of the dual decomposition is much faster than the eigenvalue decomposition, it makes more sense to treat it as a separate problem as we propose here. [sent-154, score-0.385]

64 It also ensures that the eigenvalue decomposition is only performed on symmetric matrices. [sent-155, score-0.171]

65 Each subproblem in our decomposition is still a non-trivial problem. [sent-156, score-0.209]

66 If multiple edges connected to the same node have the same absolute value, their subdifferential becomes the same, and they behave as a single point whose weight is the average. [sent-162, score-0.263]

67 4 Alternative degree priors Under the restrictions on h detailed in Proposition 1, several other choices seem reasonable. [sent-164, score-0.201]

68 Ideally we would choose h so that the expected degree distribution under the ERG model matches the particular form we wish to encourage. [sent-173, score-0.159]

69 Finding such a h for a particular graph size and degree distribution amounts to maximum likelihood parameter learning, which for ERG models is a hard learning problem. [sent-174, score-0.257]

70 5 Related Work The covariance selection problem has recently been addressed by Liu and Ihler [14] using reweighted L1 regularization. [sent-177, score-0.347]

71 log ( X¬v + ) + β v∈V v The regularizer is split into an off diagonal term which is designed to encourage sparsity in the edge parameters, and a more traditional diagonal term. [sent-179, score-0.241]

72 The reweighted L1 [7] aspect refers to the method of optimization applied. [sent-183, score-0.206]

73 25 Figure 1: ROC curves for BA model (left) and fixed degree distribution model (right) Figure 2: Reconstruction of a gene association network using L1 (left), submodular relaxation (middle), and reweighted L1 (right) methods 6 Experiments Reconstruction of synthetic networks. [sent-211, score-1.017]

74 We performed a comparison against the reweighted L1 method of Liu and Ihler [14], and a standard L1 regularized method, both implemented using ADMM for optimization. [sent-212, score-0.206]

75 Graphs with 60 nodes were generated using both the Barabasi-Albert model [4] and a predefined degree distribution model sampled using the method from Bayati et al. [sent-214, score-0.159]

76 The parameters for the degree weight sequences were chosen by grid search on random instances separate from those we tested on. [sent-226, score-0.189]

77 The only case where it gives inferior reconstructions is when it is forced to give a sparser reconstruction than the original graph. [sent-232, score-0.161]

78 A common application of sparse covariance selection is the estimation of gene association networks from experimental data. [sent-234, score-0.344]

79 A covariance matrix of gene co-activations from a number of independent micro-array experiments is typically formed, on which a number of methods, including sparse covariance selection, can be applied. [sent-235, score-0.345]

80 Many biological networks are conjectured to be scale-free, and additionally ERG modelling techniques are known to produce good results on biological networks [16]. [sent-237, score-0.192]

81 We ran our method and the L1 reconstruction method on the first 7 500 genes from the GDS1429 dataset (http://www. [sent-239, score-0.157]

82 While these networks are too small for valid statistical analysis of the degree distribution, the submodular relaxation method produces a network with structure that is commonly seen in scale free networks. [sent-246, score-0.823]

83 The star subgraph centered around gene 60 is more clearly defined in the submodular relaxation reconstruction, and the tight cluster of genes in the right is less clustered in the L1 reconstruction. [sent-247, score-0.641]

84 The reweighted L1 method produced a quite different reconstruction, with greater clustering. [sent-248, score-0.206]

85 We performed a comparison against 10 two other methods for computing the proximal operator: subgradient descent and the minimum norm point 10 (MNP) algorithm. [sent-250, score-0.544]

86 The MNP algorithm is a submodu10 lar minimization method that can be adapted for com10 puting the proximal operator [2]. [sent-251, score-0.495]

87 We took the input pa10 rameters from the last invocation of the proximal oper10 ator in the BA test, at a prior strength of 0. [sent-252, score-0.477]

88 It is clear from this and similar tests that we perIteration formed that the subgradient descent method converges too slowly to be of practical applicability for this prob- Figure 3: Comparison of proximal operators lem. [sent-256, score-0.585]

89 Subgradient methods can be a good choice when only a low accuracy solution is required; for convergence of ADMM the error in the proximal operator needs to be smaller than what can be obtained by the subgradient method. [sent-257, score-0.656]

90 The dual decomposition method achieves a much better rate of convergence, converging quickly enough to be of use even for strong accuracy requirements. [sent-259, score-0.314]

91 82ms for dual decomposition and 15ms for the MNP method. [sent-262, score-0.314]

92 The speed difference is small between a subgradient iteration and a dual decomposition iteration as both are dominated by the cost of a sort operation. [sent-263, score-0.613]

93 Overall, it is clear that our dual decomposition method is significantly more efficient. [sent-265, score-0.314]

94 For a sparse reconstruction of a BA model graph with 100 vertices and 200 edges, the average running time per 10−4 error reconstruction over 10 random graphs was 16 seconds for the reweighted L1 method and 5. [sent-268, score-0.696]

95 Conclusion We have presented a new prior for graph reconstruction, which enforces the recovery of scale-free networks. [sent-273, score-0.162]

96 This prior falls within the growing class of structured sparsity methods. [sent-274, score-0.159]

97 Unlike previous approaches to regularizing the degree distribution, our proposed prior is convex, making training tractable and convergence predictable. [sent-275, score-0.274]

98 Our method can be directly applied in contexts where sparse covariance selection is currently used, where it may improve the reconstruction quality. [sent-276, score-0.315]

99 Convex analysis and optimization with submodular functions: a tutorial. [sent-279, score-0.419]

100 Exploring biological network structure using exponential random graph models. [sent-337, score-0.208]


similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('submodular', 0.419), ('proximal', 0.383), ('reweighted', 0.206), ('erg', 0.183), ('dual', 0.181), ('mnp', 0.178), ('admm', 0.173), ('subgradient', 0.161), ('degree', 0.159), ('decomposition', 0.133), ('reconstruction', 0.126), ('operator', 0.112), ('ihler', 0.108), ('covariance', 0.102), ('graph', 0.098), ('relaxation', 0.098), ('francis', 0.096), ('gene', 0.093), ('convex', 0.09), ('minimizex', 0.089), ('lov', 0.083), ('sz', 0.083), ('node', 0.082), ('envelope', 0.077), ('positives', 0.077), ('subproblem', 0.076), ('rodolphe', 0.072), ('sort', 0.072), ('edge', 0.065), ('guillaume', 0.065), ('submodularity', 0.065), ('prior', 0.064), ('graphs', 0.063), ('sparsity', 0.062), ('networks', 0.062), ('ba', 0.062), ('optimisation', 0.06), ('scheinbert', 0.059), ('solvesubproblem', 0.059), ('multipliers', 0.057), ('jenatton', 0.056), ('graphical', 0.055), ('cone', 0.055), ('alternating', 0.053), ('bayati', 0.052), ('bioinfomatics', 0.052), ('katya', 0.052), ('shiqian', 0.052), ('snijders', 0.052), ('edges', 0.052), ('tractable', 0.051), ('boyd', 0.05), ('liu', 0.048), ('canberra', 0.048), ('julien', 0.048), ('normalizable', 0.048), ('xii', 0.048), ('sparse', 0.048), ('det', 0.046), ('sydney', 0.045), ('ordering', 0.044), ('xij', 0.043), ('free', 0.043), ('priors', 0.042), ('network', 0.042), ('formed', 0.041), ('diagonal', 0.04), ('scheinberg', 0.04), ('selection', 0.039), ('proposition', 0.039), ('eigenvalue', 0.038), ('australian', 0.037), ('joining', 0.037), ('optimizing', 0.036), ('australia', 0.036), ('reconstructions', 0.035), ('qt', 0.034), ('subdifferential', 0.034), ('exponential', 0.034), ('biological', 0.034), ('regularizer', 0.034), ('supp', 0.033), ('connected', 0.033), ('iteration', 0.033), ('bach', 0.033), ('structured', 0.033), ('obozinski', 0.033), ('diminishing', 0.032), ('mairal', 0.032), ('roc', 0.032), ('undirected', 0.032), ('absolute', 0.032), ('genes', 0.031), ('xi', 0.03), ('weight', 0.03), ('reconstructed', 0.03), ('strength', 0.03), ('inducing', 0.03), ('runtime', 0.03), ('seconds', 0.029)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 1.0 6 nips-2012-A Convex Formulation for Learning Scale-Free Networks via Submodular Relaxation

Author: Aaron Defazio, Tibério S. Caetano

Abstract: A key problem in statistics and machine learning is the determination of network structure from data. We consider the case where the structure of the graph to be reconstructed is known to be scale-free. We show that in such cases it is natural to formulate structured sparsity inducing priors using submodular functions, and we use their Lov´ sz extension to obtain a convex relaxation. For tractable classes a such as Gaussian graphical models, this leads to a convex optimization problem that can be efficiently solved. We show that our method results in an improvement in the accuracy of reconstructed networks for synthetic data. We also show how our prior encourages scale-free reconstructions on a bioinfomatics dataset.

2 0.30905637 328 nips-2012-Submodular-Bregman and the Lovász-Bregman Divergences with Applications

Author: Rishabh Iyer, Jeff A. Bilmes

Abstract: We introduce a class of discrete divergences on sets (equivalently binary vectors) that we call the submodular-Bregman divergences. We consider two kinds, defined either from tight modular upper or tight modular lower bounds of a submodular function. We show that the properties of these divergences are analogous to the (standard continuous) Bregman divergence. We demonstrate how they generalize many useful divergences, including the weighted Hamming distance, squared weighted Hamming, weighted precision, recall, conditional mutual information, and a generalized KL-divergence on sets. We also show that the generalized Bregman divergence on the Lov´ sz extension of a submodular function, which we a call the Lov´ sz-Bregman divergence, is a continuous extension of a submodular a Bregman divergence. We point out a number of applications, and in particular show that a proximal algorithm defined through the submodular Bregman divergence provides a framework for many mirror-descent style algorithms related to submodular function optimization. We also show that a generalization of the k-means algorithm using the Lov´ sz Bregman divergence is natural in clustering scenarios where a ordering is important. A unique property of this algorithm is that computing the mean ordering is extremely efficient unlike other order based distance measures. 1

3 0.23423621 282 nips-2012-Proximal Newton-type methods for convex optimization

Author: Jason Lee, Yuekai Sun, Michael Saunders

Abstract: We seek to solve convex optimization problems in composite form: minimize f (x) := g(x) + h(x), n x∈R where g is convex and continuously differentiable and h : Rn → R is a convex but not necessarily differentiable function whose proximal mapping can be evaluated efficiently. We derive a generalization of Newton-type methods to handle such convex but nonsmooth objective functions. We prove such methods are globally convergent and achieve superlinear rates of convergence in the vicinity of an optimal solution. We also demonstrate the performance of these methods using problems of relevance in machine learning and statistics. 1

4 0.20474271 164 nips-2012-Iterative Thresholding Algorithm for Sparse Inverse Covariance Estimation

Author: Benjamin Rolfs, Bala Rajaratnam, Dominique Guillot, Ian Wong, Arian Maleki

Abstract: The 1 -regularized maximum likelihood estimation problem has recently become a topic of great interest within the machine learning, statistics, and optimization communities as a method for producing sparse inverse covariance estimators. In this paper, a proximal gradient method (G-ISTA) for performing 1 -regularized covariance matrix estimation is presented. Although numerous algorithms have been proposed for solving this problem, this simple proximal gradient method is found to have attractive theoretical and numerical properties. G-ISTA has a linear rate of convergence, resulting in an O(log ε) iteration complexity to reach a tolerance of ε. This paper gives eigenvalue bounds for the G-ISTA iterates, providing a closed-form linear convergence rate. The rate is shown to be closely related to the condition number of the optimal point. Numerical convergence results and timing comparisons for the proposed method are presented. G-ISTA is shown to perform very well, especially when the optimal point is well-conditioned. 1

5 0.2000989 304 nips-2012-Selecting Diverse Features via Spectral Regularization

Author: Abhimanyu Das, Anirban Dasgupta, Ravi Kumar

Abstract: We study the problem of diverse feature selection in linear regression: selecting a small subset of diverse features that can predict a given objective. Diversity is useful for several reasons such as interpretability, robustness to noise, etc. We propose several spectral regularizers that capture a notion of diversity of features and show that these are all submodular set functions. These regularizers, when added to the objective function for linear regression, result in approximately submodular functions, which can then be maximized by efficient greedy and local search algorithms, with provable guarantees. We compare our algorithms to traditional greedy and 1 -regularization schemes and show that we obtain a more diverse set of features that result in the regression problem being stable under perturbations. 1

6 0.18348539 236 nips-2012-Near-Optimal MAP Inference for Determinantal Point Processes

7 0.17657216 143 nips-2012-Globally Convergent Dual MAP LP Relaxation Solvers using Fenchel-Young Margins

8 0.15867367 186 nips-2012-Learning as MAP Inference in Discrete Graphical Models

9 0.14949322 327 nips-2012-Structured Learning of Gaussian Graphical Models

10 0.14112845 214 nips-2012-Minimizing Sparse High-Order Energies by Submodular Vertex-Cover

11 0.12979746 326 nips-2012-Structure estimation for discrete graphical models: Generalized covariance matrices and their inverses

12 0.12967102 104 nips-2012-Dual-Space Analysis of the Sparse Linear Model

13 0.11875609 27 nips-2012-A quasi-Newton proximal splitting method

14 0.11821128 84 nips-2012-Convergence Rate Analysis of MAP Coordinate Minimization Algorithms

15 0.11688185 134 nips-2012-Finite Sample Convergence Rates of Zero-Order Stochastic Optimization Methods

16 0.11519808 199 nips-2012-Link Prediction in Graphs with Autoregressive Features

17 0.11293378 337 nips-2012-The Lovász ϑ function, SVMs and finding large dense subgraphs

18 0.099089831 7 nips-2012-A Divide-and-Conquer Method for Sparse Inverse Covariance Estimation

19 0.098668456 263 nips-2012-Optimal Regularized Dual Averaging Methods for Stochastic Optimization

20 0.098034933 319 nips-2012-Sparse Prediction with the $k$-Support Norm


similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.254), (1, 0.092), (2, 0.147), (3, -0.151), (4, 0.033), (5, 0.121), (6, -0.059), (7, -0.231), (8, -0.05), (9, 0.045), (10, -0.053), (11, 0.021), (12, 0.014), (13, 0.157), (14, 0.141), (15, 0.062), (16, -0.066), (17, -0.06), (18, -0.041), (19, -0.049), (20, 0.191), (21, 0.107), (22, 0.161), (23, -0.016), (24, 0.243), (25, 0.019), (26, -0.112), (27, -0.018), (28, -0.028), (29, -0.075), (30, -0.041), (31, 0.041), (32, -0.052), (33, 0.025), (34, 0.005), (35, 0.008), (36, 0.078), (37, -0.022), (38, 0.027), (39, -0.066), (40, -0.01), (41, -0.011), (42, -0.023), (43, -0.03), (44, -0.003), (45, 0.013), (46, -0.009), (47, 0.026), (48, -0.029), (49, 0.12)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.9290722 6 nips-2012-A Convex Formulation for Learning Scale-Free Networks via Submodular Relaxation

Author: Aaron Defazio, Tibério S. Caetano

Abstract: A key problem in statistics and machine learning is the determination of network structure from data. We consider the case where the structure of the graph to be reconstructed is known to be scale-free. We show that in such cases it is natural to formulate structured sparsity inducing priors using submodular functions, and we use their Lov´ sz extension to obtain a convex relaxation. For tractable classes a such as Gaussian graphical models, this leads to a convex optimization problem that can be efficiently solved. We show that our method results in an improvement in the accuracy of reconstructed networks for synthetic data. We also show how our prior encourages scale-free reconstructions on a bioinfomatics dataset.

2 0.78993422 328 nips-2012-Submodular-Bregman and the Lovász-Bregman Divergences with Applications

Author: Rishabh Iyer, Jeff A. Bilmes

Abstract: We introduce a class of discrete divergences on sets (equivalently binary vectors) that we call the submodular-Bregman divergences. We consider two kinds, defined either from tight modular upper or tight modular lower bounds of a submodular function. We show that the properties of these divergences are analogous to the (standard continuous) Bregman divergence. We demonstrate how they generalize many useful divergences, including the weighted Hamming distance, squared weighted Hamming, weighted precision, recall, conditional mutual information, and a generalized KL-divergence on sets. We also show that the generalized Bregman divergence on the Lov´ sz extension of a submodular function, which we a call the Lov´ sz-Bregman divergence, is a continuous extension of a submodular a Bregman divergence. We point out a number of applications, and in particular show that a proximal algorithm defined through the submodular Bregman divergence provides a framework for many mirror-descent style algorithms related to submodular function optimization. We also show that a generalization of the k-means algorithm using the Lov´ sz Bregman divergence is natural in clustering scenarios where a ordering is important. A unique property of this algorithm is that computing the mean ordering is extremely efficient unlike other order based distance measures. 1

3 0.71252948 304 nips-2012-Selecting Diverse Features via Spectral Regularization

Author: Abhimanyu Das, Anirban Dasgupta, Ravi Kumar

Abstract: We study the problem of diverse feature selection in linear regression: selecting a small subset of diverse features that can predict a given objective. Diversity is useful for several reasons such as interpretability, robustness to noise, etc. We propose several spectral regularizers that capture a notion of diversity of features and show that these are all submodular set functions. These regularizers, when added to the objective function for linear regression, result in approximately submodular functions, which can then be maximized by efficient greedy and local search algorithms, with provable guarantees. We compare our algorithms to traditional greedy and 1 -regularization schemes and show that we obtain a more diverse set of features that result in the regression problem being stable under perturbations. 1

4 0.6912576 236 nips-2012-Near-Optimal MAP Inference for Determinantal Point Processes

Author: Jennifer Gillenwater, Alex Kulesza, Ben Taskar

Abstract: Determinantal point processes (DPPs) have recently been proposed as computationally efficient probabilistic models of diverse sets for a variety of applications, including document summarization, image search, and pose estimation. Many DPP inference operations, including normalization and sampling, are tractable; however, finding the most likely configuration (MAP), which is often required in practice for decoding, is NP-hard, so we must resort to approximate inference. This optimization problem, which also arises in experimental design and sensor placement, involves finding the largest principal minor of a positive semidefinite matrix. Because the objective is log-submodular, greedy algorithms have been used in the past with some empirical success; however, these methods only give approximation guarantees in the special case of monotone objectives, which correspond to a restricted class of DPPs. In this paper we propose a new algorithm for approximating the MAP problem based on continuous techniques for submodular function maximization. Our method involves a novel continuous relaxation of the log-probability function, which, in contrast to the multilinear extension used for general submodular functions, can be evaluated and differentiated exactly and efficiently. We obtain a practical algorithm with a 1/4-approximation guarantee for a more general class of non-monotone DPPs; our algorithm also extends to MAP inference under complex polytope constraints, making it possible to combine DPPs with Markov random fields, weighted matchings, and other models. We demonstrate that our approach outperforms standard and recent methods on both synthetic and real-world data. 1

5 0.66441828 214 nips-2012-Minimizing Sparse High-Order Energies by Submodular Vertex-Cover

Author: Andrew Delong, Olga Veksler, Anton Osokin, Yuri Boykov

Abstract: Inference in high-order graphical models has become important in recent years. Several approaches are based, for example, on generalized message-passing, or on transformation to a pairwise model with extra ‘auxiliary’ variables. We focus on a special case where a much more efficient transformation is possible. Instead of adding variables, we transform the original problem into a comparatively small instance of submodular vertex-cover. These vertex-cover instances can then be attacked by existing algorithms (e.g. belief propagation, QPBO), where they often run 4–15 times faster and find better solutions than when applied to the original problem. We evaluate our approach on synthetic data, then we show applications within a fast hierarchical clustering and model-fitting framework. 1

6 0.63365299 327 nips-2012-Structured Learning of Gaussian Graphical Models

7 0.5545339 164 nips-2012-Iterative Thresholding Algorithm for Sparse Inverse Covariance Estimation

8 0.54909825 337 nips-2012-The Lovász ϑ function, SVMs and finding large dense subgraphs

9 0.48887262 346 nips-2012-Topology Constraints in Graphical Models

10 0.48129189 132 nips-2012-Fiedler Random Fields: A Large-Scale Spectral Approach to Statistical Network Modeling

11 0.48101458 186 nips-2012-Learning as MAP Inference in Discrete Graphical Models

12 0.47424823 63 nips-2012-CPRL -- An Extension of Compressive Sensing to the Phase Retrieval Problem

13 0.4640885 326 nips-2012-Structure estimation for discrete graphical models: Generalized covariance matrices and their inverses

14 0.46155563 27 nips-2012-A quasi-Newton proximal splitting method

15 0.4590885 282 nips-2012-Proximal Newton-type methods for convex optimization

16 0.45255306 199 nips-2012-Link Prediction in Graphs with Autoregressive Features

17 0.45239052 263 nips-2012-Optimal Regularized Dual Averaging Methods for Stochastic Optimization

18 0.45135492 302 nips-2012-Scaling MPE Inference for Constrained Continuous Markov Random Fields with Consensus Optimization

19 0.45004314 104 nips-2012-Dual-Space Analysis of the Sparse Linear Model

20 0.44781283 143 nips-2012-Globally Convergent Dual MAP LP Relaxation Solvers using Fenchel-Young Margins


similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(0, 0.085), (14, 0.185), (21, 0.017), (27, 0.021), (38, 0.183), (39, 0.013), (42, 0.031), (54, 0.026), (55, 0.021), (74, 0.079), (76, 0.148), (80, 0.07), (92, 0.044)]

similar papers list:

simIndex simValue paperId paperTitle

1 0.88921893 269 nips-2012-Persistent Homology for Learning Densities with Bounded Support

Author: Florian T. Pokorny, Hedvig Kjellström, Danica Kragic, Carl Ek

Abstract: We present a novel method for learning densities with bounded support which enables us to incorporate ‘hard’ topological constraints. In particular, we show how emerging techniques from computational algebraic topology and the notion of persistent homology can be combined with kernel-based methods from machine learning for the purpose of density estimation. The proposed formalism facilitates learning of models with bounded support in a principled way, and – by incorporating persistent homology techniques in our approach – we are able to encode algebraic-topological constraints which are not addressed in current state of the art probabilistic models. We study the behaviour of our method on two synthetic examples for various sample sizes and exemplify the benefits of the proposed approach on a real-world dataset by learning a motion model for a race car. We show how to learn a model which respects the underlying topological structure of the racetrack, constraining the trajectories of the car. 1

2 0.87928873 19 nips-2012-A Spectral Algorithm for Latent Dirichlet Allocation

Author: Anima Anandkumar, Yi-kai Liu, Daniel J. Hsu, Dean P. Foster, Sham M. Kakade

Abstract: Topic modeling is a generalization of clustering that posits that observations (words in a document) are generated by multiple latent factors (topics), as opposed to just one. This increased representational power comes at the cost of a more challenging unsupervised learning problem of estimating the topic-word distributions when only words are observed, and the topics are hidden. This work provides a simple and efficient learning procedure that is guaranteed to recover the parameters for a wide class of topic models, including Latent Dirichlet Allocation (LDA). For LDA, the procedure correctly recovers both the topic-word distributions and the parameters of the Dirichlet prior over the topic mixtures, using only trigram statistics (i.e., third order moments, which may be estimated with documents containing just three words). The method, called Excess Correlation Analysis, is based on a spectral decomposition of low-order moments via two singular value decompositions (SVDs). Moreover, the algorithm is scalable, since the SVDs are carried out only on k × k matrices, where k is the number of latent factors (topics) and is typically much smaller than the dimension of the observation (word) space. 1

same-paper 3 0.86865675 6 nips-2012-A Convex Formulation for Learning Scale-Free Networks via Submodular Relaxation

Author: Aaron Defazio, Tibério S. Caetano

Abstract: A key problem in statistics and machine learning is the determination of network structure from data. We consider the case where the structure of the graph to be reconstructed is known to be scale-free. We show that in such cases it is natural to formulate structured sparsity inducing priors using submodular functions, and we use their Lov´ sz extension to obtain a convex relaxation. For tractable classes a such as Gaussian graphical models, this leads to a convex optimization problem that can be efficiently solved. We show that our method results in an improvement in the accuracy of reconstructed networks for synthetic data. We also show how our prior encourages scale-free reconstructions on a bioinfomatics dataset.

4 0.85134912 254 nips-2012-On the Sample Complexity of Robust PCA

Author: Matthew Coudron, Gilad Lerman

Abstract: We estimate the rate of convergence and sample complexity of a recent robust estimator for a generalized version of the inverse covariance matrix. This estimator is used in a convex algorithm for robust subspace recovery (i.e., robust PCA). Our model assumes a sub-Gaussian underlying distribution and an i.i.d. sample from it. Our main result shows with high probability that the norm of the difference between the generalized inverse covariance of the underlying distribution and its estimator from an i.i.d. sample of size N is of order O(N −0.5+ ) for arbitrarily small > 0 (affecting the probabilistic estimate); this rate of convergence is close to the one of direct covariance estimation, i.e., O(N −0.5 ). Our precise probabilistic estimate implies for some natural settings that the sample complexity of the generalized inverse covariance estimation when using the Frobenius norm is O(D2+δ ) for arbitrarily small δ > 0 (whereas the sample complexity of direct covariance estimation with Frobenius norm is O(D2 )). These results provide similar rates of convergence and sample complexity for the corresponding robust subspace recovery algorithm. To the best of our knowledge, this is the only work analyzing the sample complexity of any robust PCA algorithm. 1

5 0.80845129 69 nips-2012-Clustering Sparse Graphs

Author: Yudong Chen, Sujay Sanghavi, Huan Xu

Abstract: We develop a new algorithm to cluster sparse unweighted graphs – i.e. partition the nodes into disjoint clusters so that there is higher density within clusters, and low across clusters. By sparsity we mean the setting where both the in-cluster and across cluster edge densities are very small, possibly vanishing in the size of the graph. Sparsity makes the problem noisier, and hence more difficult to solve. Any clustering involves a tradeoff between minimizing two kinds of errors: missing edges within clusters and present edges across clusters. Our insight is that in the sparse case, these must be penalized differently. We analyze our algorithm’s performance on the natural, classical and widely studied “planted partition” model (also called the stochastic block model); we show that our algorithm can cluster sparser graphs, and with smaller clusters, than all previous methods. This is seen empirically as well. 1

6 0.80518234 149 nips-2012-Hierarchical Optimistic Region Selection driven by Curiosity

7 0.80387568 178 nips-2012-Learning Label Trees for Probabilistic Modelling of Implicit Feedback

8 0.80059868 199 nips-2012-Link Prediction in Graphs with Autoregressive Features

9 0.79865628 83 nips-2012-Controlled Recognition Bounds for Visual Learning and Exploration

10 0.79713213 172 nips-2012-Latent Graphical Model Selection: Efficient Methods for Locally Tree-like Graphs

11 0.79710811 335 nips-2012-The Bethe Partition Function of Log-supermodular Graphical Models

12 0.79594487 260 nips-2012-Online Sum-Product Computation Over Trees

13 0.79590803 316 nips-2012-Small-Variance Asymptotics for Exponential Family Dirichlet Process Mixture Models

14 0.79584414 236 nips-2012-Near-Optimal MAP Inference for Determinantal Point Processes

15 0.79525775 125 nips-2012-Factoring nonnegative matrices with linear programs

16 0.79480195 304 nips-2012-Selecting Diverse Features via Spectral Regularization

17 0.7925449 355 nips-2012-Truncation-free Online Variational Inference for Bayesian Nonparametric Models

18 0.79249716 78 nips-2012-Compressive Sensing MRI with Wavelet Tree Sparsity

19 0.79094034 27 nips-2012-A quasi-Newton proximal splitting method

20 0.79024553 342 nips-2012-The variational hierarchical EM algorithm for clustering hidden Markov models