nips nips2012 nips2012-326 knowledge-graph by maker-knowledge-mining

326 nips-2012-Structure estimation for discrete graphical models: Generalized covariance matrices and their inverses

Source: pdf

Author: Po-ling Loh, Martin J. Wainwright

Abstract: We investigate a curious relationship between the structure of a discrete graphical model and the support of the inverse of a generalized covariance matrix. We show that for certain graph structures, the support of the inverse covariance matrix of indicator variables on the vertices of a graph reﬂects the conditional independence structure of the graph. Our work extends results that have previously been established only in the context of multivariate Gaussian graphical models, thereby addressing an open question about the signiﬁcance of the inverse covariance matrix of a non-Gaussian distribution. Based on our population-level results, we show how the graphical Lasso may be used to recover the edge structure of certain classes of discrete graphical models, and present simulations to verify our theoretical results. 1

Reference: text

Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 Structure estimation for discrete graphical models: Generalized covariance matrices and their inverses Martin J. [sent-1, score-0.818]

2 edu Abstract We investigate a curious relationship between the structure of a discrete graphical model and the support of the inverse of a generalized covariance matrix. [sent-5, score-1.033]

3 We show that for certain graph structures, the support of the inverse covariance matrix of indicator variables on the vertices of a graph reﬂects the conditional independence structure of the graph. [sent-6, score-1.203]

4 Our work extends results that have previously been established only in the context of multivariate Gaussian graphical models, thereby addressing an open question about the signiﬁcance of the inverse covariance matrix of a non-Gaussian distribution. [sent-7, score-0.902]

5 Based on our population-level results, we show how the graphical Lasso may be used to recover the edge structure of certain classes of discrete graphical models, and present simulations to verify our theoretical results. [sent-8, score-0.883]

6 In many applications, learning the edge structure of an underlying graphical model is of great importance—for instance, a graphical model may be used to represent friendships between people in a social network, or links between organisms with the propensity to spread an infectious disease [1]. [sent-10, score-0.745]

7 It is well known that zeros in the inverse covariance matrix of a multivariate Gaussian distribution indicate the absence of an edge in the corresponding graphical model. [sent-11, score-1.099]

8 This fact, combined with techniques in high-dimensional statistical inference, has been leveraged by many authors to recover the structure of a Gaussian graphical model when the edge set is sparse (e. [sent-12, score-0.533]

9 [6, 7] introduced the notion of a nonparanormal distribution, which generalizes the Gaussian distribution by allowing for univariate monotonic transformations, and argued that the same structural properties of the inverse covariance matrix carry over to the nonparanormal. [sent-16, score-0.658]

10 However, the question of whether a relationship exists between conditional independence and the structure of the inverse covariance matrix in a general graph remains unresolved. [sent-17, score-0.911]

11 In this paper, we focus on discrete graphical models and establish a number of interesting links between covariance matrices and the edge structure of an underlying graph. [sent-18, score-1.036]

12 Instead of only analyzing the standard covariance matrix, we show that it is often fruitful to augment the usual covariance matrix with higherorder interaction terms. [sent-19, score-0.835]

13 Our main result has a striking corollary in the context of tree-structured graphs: for any discrete graphical model, the inverse of a generalized covariance matrix is always (block) graph-structured. [sent-20, score-1.208]

14 In particular, for binary variables, the inverse of the usual covariance matrix corresponds exactly to the edge structure of the tree. [sent-21, score-0.934]

15 1 Other related work on graphical model selection for discrete graphs includes the classic ChowLiu algorithm for trees [8]; nodewise logistic regression for discrete models with pairwise interactions [9, 10]; and techniques based on conditional entropy or mutual information [11, 12]. [sent-24, score-0.565]

16 Our main contribution is to present a clean and surprising result on a simple link between the inverse covariance matrix and edge structure of a discrete model, which may be used to derive inference algorithms applicable even to data with systematic corruptions. [sent-25, score-1.005]

17 The remainder of the paper is organized as follows: In Section 2, we provide brief background and notation on graphical models, and describe the classes of augmented covariance matrices we will consider. [sent-26, score-0.754]

18 In Section 3, we state our main results on the relationship between the support of generalized inverse covariance matrices and the edge structure of a discrete graphical model. [sent-27, score-1.288]

19 We relate our population-level results to concrete algorithms that are guaranteed to recover the edge structure of a discrete graph with high probability. [sent-28, score-0.634]

20 2 Background and problem setup In this section, we provide background on graphical models and exponential families. [sent-31, score-0.281]

21 1 Graphical models An undirected graph G = (V, E) consists of a collection of vertices V = {1, 2, . [sent-34, score-0.308]

22 , xp ) ∝ ψC (xC ), (1) C∈C where C is the set of all cliques (fully-connected subsets of V ) and ψC (xC ) are the corresponding clique potentials. [sent-48, score-0.387]

23 The factorization (1) may alternatively be represented in terms of an exponential family associated with the clique structure of G. [sent-49, score-0.264]

24 If we associate the function φs (xs ) = xs with clique {s} and the function φst (xs , xt ) = xs xt with edge (s, t), the factorization (2) becomes Pθ (x1 , . [sent-57, score-0.82]

25 Note in particular that when m = 2, X0 0 state containing the vector of all ones, and the sufﬁcient statistics are given by φC;J (xC ) = xs , for C∈C and is a singleton J = {1}|C| ; s∈C i. [sent-83, score-0.345]

26 2 Covariance matrices and beyond Consider the usual covariance matrix Σ = cov(X1 , . [sent-89, score-0.549]

27 Nonetheless, it is tempting to conjecture that inverse covariance matrices, and more generally, inverses of higher-order moment matrices, might be related to graph structure. [sent-97, score-0.808]

28 For a multivariate Gaussian graphical model deﬁned on G, standard theory predicts that the inverse covariance matrix Γ = Σ−1 of the distribution is graph-structured: Γst = 0 if and only if (s, t) ∈ E. [sent-103, score-0.902]

29 Surprisingly, this is also the case for the chain graph with binary / variables: a little computation show that Γ takes the form shown in panel (f). [sent-104, score-0.354]

30 However, this statement is not true for the single-cycle graph shown in panel (b), with added edge (1, 4). [sent-105, score-0.551]

31 Indeed, as shown in panel (g), the inverse covariance matrix has no nonzero entries at all. [sent-106, score-0.858]

32 But for a more complicated graph, say the one in (e), we again observe a graph-structured inverse covariance matrix. [sent-107, score-0.536]

33 The 5 × 5 inverse of this generalized covariance matrix takes the form   1. [sent-140, score-0.688]

34 19 This matrix safely separates nodes 1 and 4, but the entry corresponding to the phantom edge (1, 3) is not equal to zero. [sent-164, score-0.314]

35 Indeed, we would observe a similar phenomenon if we chose to augment the graph by including the edge (2, 4) rather than (1, 3). [sent-165, score-0.402]

36 This example shows that the usual inverse covariance matrix is not always graph-structured, but computing generalized covariance matrices involving higher-order interaction terms may indicate graph structure. [sent-167, score-1.355]

37 Now let us consider a more general graphical model that adds the 3-clique interaction terms shown in panel (d) to the usual Ising terms. [sent-168, score-0.456]

38 We compute the covariance matrix of the augmented vector Φ(X) = X1 , X2 , X3 , X4 , X1 X2 , X2 X3 , X3 X4 , X1 X4 , X1 X3 , X1 X2 X3 , X1 X3 X4 ∈ {0, 1}11 . [sent-169, score-0.536]

39 ) The goal of this paper is to understand when certain inverse covariances do (and do not) capture the structure of a graphical model. [sent-172, score-0.493]

40 3 Main results and consequences We now state our main results on the structure of generalized inverse covariance matrices and graph structure. [sent-174, score-0.971]

41 1 Population-level results Our main result concerns a connection between the inverses of generalized inverse covariance matrices associated with the model (4) and the structure of the graph. [sent-177, score-0.78]

42 Recall that a triangulation of a graph G = (V, E) is an augmented graph G = (V, E) with no chordless 4-cycles. [sent-179, score-0.75]

43 (For instance, the single cycle in panel (b) is a chordless 4-cycle, whereas panel 4 (c) shows a triangulated graph. [sent-180, score-0.455]

44 The dinosaur graph in panel (e) is also triangulated. [sent-181, score-0.509]

45 ) The edge set E corresponds to the original edge set E plus the additional edges added to form the triangulation. [sent-182, score-0.465]

46 We also require some notation for deﬁning generalized covariance matrices. [sent-184, score-0.409]

47 We are now ready to state our main theorem regarding the support of a certain type of generalized inverse covariance matrix. [sent-190, score-0.631]

48 ] Consider an arbitrary discrete graphical model of the form (4), and let T be the set of maximal cliques in any triangulation of G. [sent-193, score-0.68]

49 Then the inverse Γ of the augmented covariance matrix cov(Φ(X; pow(T ))) is block graph-structured in the following sense: (a) For any two subsets A and B which are not subsets of the same maximal clique, the block Γ(pow(A), pow(B)) is zero. [sent-194, score-1.067]

50 This correspondence is induced by the Fenchel-Legendre duality between the log partition function A and its dual A∗ , and allows us to relate Γ to the graph structure. [sent-198, score-0.268]

51 Note that when the original graph G is a tree, the graph is already triangulated and the set T in Theorem 1 is equal to the edge set E. [sent-199, score-0.685]

52 Hence, Theorem 1 implies that the inverse Γ of the augmented covariance matrix with sufﬁcient statistics for all vertices and edges is graph-structured, and blocks of nonzeros in Γ correspond to edges in the graph. [sent-200, score-0.984]

53 In particular, the (m − 1)p × (m − 1)p submatrix ΓV,V corresponding to sufﬁcient statistics of vertices is block graph-structured; in the case when m = 2, the submatrix ΓV,V is simply the p × p block corresponding to the vector (X1 , . [sent-201, score-0.308]

54 When G is not triangulated, however, we may need to invert a larger augmented covariance matrix and include sufﬁcient statistics over pairs (s, t) ∈ E, as well. [sent-205, score-0.575]

55 / In fact, it is not necessary to take the set of sufﬁcient statistics over all maximal cliques, and we may consider a slightly smaller augmented covariance matrix. [sent-206, score-0.539]

56 Recall that any triangulation T gives rise to a junction tree representation of G, where nodes of the junction tree are subsets of V corresponding to maximal cliques in T , and the edges are intersections of adjacent cliques known as separator sets [15]. [sent-207, score-1.107]

57 The following corollary involves the generalized covariance matrix containing only sufﬁcient statistics for nodes and separator sets of T : Corollary 1. [sent-208, score-0.934]

58 Let S be the set of separator sets in any triangulation of G, and let Γ be the inverse of cov(Φ(X; V ∪ pow(S))). [sent-209, score-0.573]

59 / The proof of this corollary is based on applying the block matrix inversion formula [17] to express ΓV,V in terms of the matrix Γ from Theorem 1. [sent-211, score-0.413]

60 Panel (c) of Example 1 and the associated matrix Γaug provides a concrete instance of this corollary in action. [sent-212, score-0.303]

61 In panel (c), the single separator set in the triangulation is {1, 3}, so augmenting the usual covariance matrix with the additional sufﬁcient statistic X1 X3 and taking the inverse should yield a graph-structured matrix. [sent-213, score-1.213]

62 Hence, the generalized covariance matrix of Corollary 1 has a smaller dimension than the generalized covariance matrix of Theorem 1, and is much more tractable for estimation. [sent-216, score-0.992]

63 5 Although Theorem 1 and Corollary 1 are clean results at the population level, however, forming the proper augmented covariance matrix requires some prior knowledge of the graph—namely, which edges are involved in a suitable triangulation. [sent-217, score-0.67]

64 In the case of a graph with only singleton separator sets, Corollary 1 specializes to the following useful corollary, which only involves the covariance matrix over indicators of vertices of G: Corollary 2. [sent-218, score-1.021]

65 For any graph with singleton separator sets, the inverse matrix Γ of the ordinary covariance matrix cov(Φ(X; V )) is graph-structured. [sent-219, score-1.264]

66 ) Again, we may relate this corollary to Example 1—the inverse covariance matrices for the tree graph in panel (a) and the dinosaur graph in panel (e) are exactly graph-structured. [sent-221, score-1.721]

67 Indeed, although the dinosaur graph is not a tree, it possesses the nice property that the only separator sets in its junction tree are singletons. [sent-222, score-0.71]

68 Corollary 1 also guarantees that inverse covariances may be partially graph-structured, in the sense that (ΓV,V )st = 0 for any pair of vertices (s, t) separable by a singleton separator set. [sent-223, score-0.577]

69 Indeed, the matrix ΓV,V over singleton vertices is agnostic to which triangulation we choose for the graph. [sent-225, score-0.457]

70 In settings where there exists a junction tree representation of the graph with only singleton separator sets, Corollary 2 has a number of useful implications for the consistency of methods that have traditionally only been applied for edge recovery in Gaussian graphical models. [sent-226, score-1.155]

71 2 Consequences for graphical Lasso for trees Moving beyond the population level, we now establish results concerning the statistical consistency of methods for graph selection in discrete graphical models, based on i. [sent-229, score-0.954]

72 , Xp ) with covariance Σ∗ , consider the estimator Θ ∈ arg min{trace(ΣΘ) − log det(Θ) + λn Θ 0 |Θst |}, (6) s=t where Σ is an estimator for Σ∗ . [sent-237, score-0.438]

73 For multivariate Gaussian data, this program is an 1 -regularized maximum likelihood estimate known as the graphical Lasso, and is a well-studied method for recovering the edge structure in a Gaussian graphical model [18, 19, 20]. [sent-238, score-0.855]

74 Although the program (6) has no relation to the MLE in the case of a discrete graphical model, it is still useful for estimating Θ∗ := (Σ∗ )−1 , and our analysis shows the surprising result that the program is consistent for recovering the structure of any tree-structured Ising model. [sent-239, score-0.529]

75 We consider a general estimate Σ of the covariance matrix Σ such that log p ≤ c exp(−ψ(n, p)) (7) n for functions ϕ and ψ, where · max denotes the elementwise ∞ -norm. [sent-240, score-0.477]

76 P Σ − Σ∗ max ≥ ϕ(Σ∗ ) In addition, we require a certain mutual incoherence condition on the true covariance matrix Σ∗ to control the correlation of non-edge variables with edge variables in the graph. [sent-245, score-0.73]

77 Form a suitable estimate Σ of the true covariance matrix Σ. [sent-253, score-0.431]

78 Optimize the graphical Lasso program (6) with parameter λn , denoting the solution by Θ. [sent-255, score-0.295]

79 We then have the following consistency result, a straightforward consequence of the graph structure of Θ∗ and concentration properties of Σ: Corollary 3. [sent-258, score-0.303]

80 If n d2 log p, then Algorithm 1 with Σ the sample covariance matrix and parameters λn ≥ c1 log p and τn = c2 c1 log p + λn recovers all edges (s, t) with α n α n |Θ∗ | > τn /2, with probability at least 1 − c exp(−c log p). [sent-260, score-0.502]

81 st Hence, if |Θ∗ | > τn /2 for all edges (s, t) ∈ E, Corollary 3 guarantees that the log-determinant st method plus thresholding recovers the full graph exactly. [sent-261, score-0.488]

82 In the case of the standard sample covariance matrix, this method has been implemented by Banerjee et al. [sent-262, score-0.344]

83 , see panel (b) in Figure 1), so the graphical Lasso (6) is inconsistent in general. [sent-268, score-0.396]

84 On the positive side, if we restrict ourselves to tree-structured graphs, the estimator (6) is attractive, since it relies only on an estimate Σ of the population covariance Σ∗ that satisﬁes the deviation condition (7). [sent-269, score-0.425]

85 As a concrete example of how we may correct the program (6) to handle corrupted data, consider the case when each entry of xi is missing independently with probability ρ, and the corresponding observations zi are zero-ﬁlled for missing entries. [sent-272, score-0.328]

86 A natural estimator is Σ= 1 n n T zi zi ÷M − i=1 1 zz T , (1 − ρ)2 (9) where ÷ denotes elementwise division by the matrix M with diagonal entries (1 − ρ) and offdiagonal entries (1 − ρ)2 , correcting for the bias in both the mean and second moment terms. [sent-273, score-0.268]

87 Generalizing to the case of m-ary discrete graphical models with m > 2, we may easily modify the program (6) by replacing the elementwise 1 -penalty by the corresponding group 1 -penalty, where the groups are the indicator variables for a given vertex. [sent-280, score-0.484]

88 The ﬁve curves show the results of our graphical Lasso method applied to the dinosaur graph in Figure 1. [sent-286, score-0.607]

89 Indeed, since the dinosaur graph has only singleton separators, Corollary 2 ensures that the inverse covariance matrix is exactly graph-structured. [sent-294, score-1.095]

90 sample size for dino graph with missing data 1 0. [sent-297, score-0.342]

91 Simulation results for our graphical Lasso method on binary Ising models, allowing for missing data in the observations. [sent-306, score-0.338]

92 5 Discussion The correspondence between the inverse covariance matrix and graph structure of a Gauss-Markov random ﬁeld is a classical fact, with many useful consequences for efﬁcient estimation of Gaussian graphical models. [sent-310, score-1.214]

93 In this paper, we have provided a partial afﬁrmative answer to this question and developed theoretical results extending such relationships to discrete undirected graphical models. [sent-312, score-0.377]

94 As shown by our results, the inverse of the ordinary covariance matrix is graph-structured for special subclasses of graphs with singleton separator sets. [sent-313, score-1.02]

95 More generally, we have shown that it is worthwhile to consider the inverses of generalized covariance matrices, formed by introducing indicator functions for larger subsets of variables. [sent-314, score-0.593]

96 When these subsets are chosen to reﬂect the structure of an underlying junction tree, the edge structure is reﬂected in the inverse covariance matrix. [sent-315, score-1.007]

97 Our population-level results have a number of statistical consequences for graphical model selection. [sent-316, score-0.3]

98 We have shown how our results may be used to establish consistency (or inconsistency) of the standard graphical Lasso applied to discrete graphs, even when observations are systematically corrupted by mechanisms such as additive noise and missing data. [sent-317, score-0.618]

99 High-dimensional covariance estimation by minimizing 1 -penalized log-determinant divergence. [sent-350, score-0.344]

100 Structure estimation for discrete graphical models: Generalized covariance matrices and their inverses. [sent-419, score-0.751]

similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('covariance', 0.344), ('pow', 0.311), ('graphical', 0.247), ('graph', 0.205), ('separator', 0.198), ('edge', 0.197), ('xs', 0.194), ('inverse', 0.192), ('triangulation', 0.183), ('corollary', 0.171), ('ising', 0.161), ('dinosaur', 0.155), ('panel', 0.149), ('xc', 0.123), ('xp', 0.114), ('singleton', 0.112), ('st', 0.106), ('augmented', 0.105), ('discrete', 0.102), ('clique', 0.1), ('lasso', 0.099), ('cliques', 0.097), ('loh', 0.091), ('missing', 0.091), ('junction', 0.09), ('matrix', 0.087), ('aug', 0.087), ('rho', 0.087), ('triangulated', 0.078), ('subsets', 0.076), ('vertices', 0.075), ('edges', 0.071), ('block', 0.068), ('cov', 0.067), ('inverses', 0.067), ('incoherence', 0.065), ('generalized', 0.065), ('tree', 0.062), ('usual', 0.06), ('matrices', 0.058), ('structure', 0.054), ('ic', 0.054), ('rescaled', 0.054), ('corrupted', 0.053), ('consequences', 0.053), ('chordless', 0.052), ('wainwright', 0.051), ('maximal', 0.051), ('program', 0.048), ('graphs', 0.048), ('estimator', 0.047), ('systematically', 0.047), ('xt', 0.046), ('elementwise', 0.046), ('ravikumar', 0.046), ('dino', 0.046), ('concrete', 0.045), ('entries', 0.044), ('consistency', 0.044), ('factorization', 0.043), ('suf', 0.042), ('nonzero', 0.042), ('concerning', 0.041), ('indicator', 0.041), ('vertex', 0.04), ('berkeley', 0.04), ('contaminated', 0.04), ('prob', 0.04), ('ordinary', 0.039), ('success', 0.039), ('statistics', 0.039), ('whenever', 0.038), ('mutual', 0.037), ('simulations', 0.036), ('potentials', 0.035), ('leveraged', 0.035), ('nonparanormal', 0.035), ('establish', 0.034), ('population', 0.034), ('exponential', 0.034), ('gaussian', 0.034), ('corollaries', 0.034), ('family', 0.033), ('xa', 0.032), ('multivariate', 0.032), ('correspondence', 0.032), ('relate', 0.031), ('nodes', 0.03), ('recovering', 0.03), ('theorem', 0.03), ('relationship', 0.029), ('submatrix', 0.029), ('clean', 0.029), ('interactions', 0.029), ('families', 0.029), ('semiparametric', 0.029), ('liu', 0.028), ('undirected', 0.028), ('cycle', 0.027), ('statements', 0.027)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 1.0 326 nips-2012-Structure estimation for discrete graphical models: Generalized covariance matrices and their inverses

Author: Po-ling Loh, Martin J. Wainwright

2 0.31465679 147 nips-2012-Graphical Models via Generalized Linear Models

Author: Eunho Yang, Genevera Allen, Zhandong Liu, Pradeep K. Ravikumar

Abstract: Undirected graphical models, also known as Markov networks, enjoy popularity in a variety of applications. The popular instances of these models such as Gaussian Markov Random Fields (GMRFs), Ising models, and multinomial discrete models, however do not capture the characteristics of data in many settings. We introduce a new class of graphical models based on generalized linear models (GLMs) by assuming that node-wise conditional distributions arise from exponential families. Our models allow one to estimate multivariate Markov networks given any univariate exponential distribution, such as Poisson, negative binomial, and exponential, by ﬁtting penalized GLMs to select the neighborhood for each node. A major contribution of this paper is the rigorous statistical analysis showing that with high probability, the neighborhood of our graphical models can be recovered exactly. We also provide examples of non-Gaussian high-throughput genomic networks learned via our GLM graphical models. 1

3 0.20680694 180 nips-2012-Learning Mixtures of Tree Graphical Models

Author: Anima Anandkumar, Furong Huang, Daniel J. Hsu, Sham M. Kakade

Abstract: We consider unsupervised estimation of mixtures of discrete graphical models, where the class variable is hidden and each mixture component can have a potentially different Markov graph structure and parameters over the observed variables. We propose a novel method for estimating the mixture components with provable guarantees. Our output is a tree-mixture model which serves as a good approximation to the underlying graphical model mixture. The sample and computational requirements for our method scale as poly(p, r), for an r-component mixture of pvariate graphical models, for a wide class of models which includes tree mixtures and mixtures over bounded degree graphs. Keywords: Graphical models, mixture models, spectral methods, tree approximation.

4 0.19787106 172 nips-2012-Latent Graphical Model Selection: Efficient Methods for Locally Tree-like Graphs

Author: Anima Anandkumar, Ragupathyraj Valluvan

Abstract: Graphical model selection refers to the problem of estimating the unknown graph structure given observations at the nodes in the model. We consider a challenging instance of this problem when some of the nodes are latent or hidden. We characterize conditions for tractable graph estimation and develop efﬁcient methods with provable guarantees. We consider the class of Ising models Markov on locally tree-like graphs, which are in the regime of correlation decay. We propose an efﬁcient method for graph estimation, and establish its structural consistency −δη(η+1)−2 when the number of samples n scales as n = Ω(θmin log p), where θmin is the minimum edge potential, δ is the depth (i.e., distance from a hidden node to the nearest observed nodes), and η is a parameter which depends on the minimum and maximum node and edge potentials in the Ising model. The proposed method is practical to implement and provides ﬂexibility to control the number of latent variables and the cycle lengths in the output graph. We also present necessary conditions for graph estimation by any method and show that our method nearly matches the lower bound on sample requirements. Keywords: Graphical model selection, latent variables, quartet methods, locally tree-like graphs. 1

5 0.19025955 164 nips-2012-Iterative Thresholding Algorithm for Sparse Inverse Covariance Estimation

Author: Benjamin Rolfs, Bala Rajaratnam, Dominique Guillot, Ian Wong, Arian Maleki

Abstract: The 1 -regularized maximum likelihood estimation problem has recently become a topic of great interest within the machine learning, statistics, and optimization communities as a method for producing sparse inverse covariance estimators. In this paper, a proximal gradient method (G-ISTA) for performing 1 -regularized covariance matrix estimation is presented. Although numerous algorithms have been proposed for solving this problem, this simple proximal gradient method is found to have attractive theoretical and numerical properties. G-ISTA has a linear rate of convergence, resulting in an O(log ε) iteration complexity to reach a tolerance of ε. This paper gives eigenvalue bounds for the G-ISTA iterates, providing a closed-form linear convergence rate. The rate is shown to be closely related to the condition number of the optimal point. Numerical convergence results and timing comparisons for the proposed method are presented. G-ISTA is shown to perform very well, especially when the optimal point is well-conditioned. 1

6 0.18450566 312 nips-2012-Simultaneously Leveraging Output and Task Structures for Multiple-Output Regression

7 0.15975356 7 nips-2012-A Divide-and-Conquer Method for Sparse Inverse Covariance Estimation

8 0.15682855 186 nips-2012-Learning as MAP Inference in Discrete Graphical Models

9 0.14937048 213 nips-2012-Minimization of Continuous Bethe Approximations: A Positive Variation

10 0.14333223 254 nips-2012-On the Sample Complexity of Robust PCA

11 0.13362944 335 nips-2012-The Bethe Partition Function of Log-supermodular Graphical Models

12 0.12979746 6 nips-2012-A Convex Formulation for Learning Scale-Free Networks via Submodular Relaxation

13 0.12967953 317 nips-2012-Smooth-projected Neighborhood Pursuit for High-dimensional Nonparanormal Graph Estimation

14 0.12499874 331 nips-2012-Symbolic Dynamic Programming for Continuous State and Observation POMDPs

15 0.12188089 337 nips-2012-The Lovász ϑ function, SVMs and finding large dense subgraphs

16 0.11434478 69 nips-2012-Clustering Sparse Graphs

17 0.11213608 96 nips-2012-Density Propagation and Improved Bounds on the Partition Function

18 0.11131287 352 nips-2012-Transelliptical Graphical Models

19 0.10487586 346 nips-2012-Topology Constraints in Graphical Models

20 0.10045412 82 nips-2012-Continuous Relaxations for Discrete Hamiltonian Monte Carlo

similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.255), (1, 0.091), (2, 0.145), (3, -0.108), (4, -0.093), (5, 0.038), (6, -0.028), (7, -0.236), (8, -0.27), (9, 0.011), (10, -0.076), (11, -0.09), (12, 0.025), (13, 0.097), (14, -0.007), (15, 0.015), (16, 0.155), (17, 0.016), (18, -0.109), (19, 0.076), (20, 0.072), (21, -0.045), (22, -0.139), (23, -0.012), (24, 0.032), (25, 0.021), (26, 0.124), (27, -0.042), (28, -0.076), (29, -0.051), (30, -0.048), (31, 0.03), (32, -0.005), (33, -0.065), (34, 0.125), (35, -0.007), (36, 0.022), (37, 0.087), (38, 0.009), (39, -0.09), (40, 0.08), (41, 0.025), (42, -0.018), (43, 0.046), (44, -0.082), (45, -0.001), (46, 0.002), (47, 0.042), (48, 0.022), (49, -0.044)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.97544247 326 nips-2012-Structure estimation for discrete graphical models: Generalized covariance matrices and their inverses

Author: Po-ling Loh, Martin J. Wainwright

2 0.88062173 147 nips-2012-Graphical Models via Generalized Linear Models

Author: Eunho Yang, Genevera Allen, Zhandong Liu, Pradeep K. Ravikumar

3 0.74406117 327 nips-2012-Structured Learning of Gaussian Graphical Models

Author: Karthik Mohan, Mike Chung, Seungyeop Han, Daniela Witten, Su-in Lee, Maryam Fazel

Abstract: We consider estimation of multiple high-dimensional Gaussian graphical models corresponding to a single set of nodes under several distinct conditions. We assume that most aspects of the networks are shared, but that there are some structured differences between them. Speciﬁcally, the network differences are generated from node perturbations: a few nodes are perturbed across networks, and most or all edges stemming from such nodes differ between networks. This corresponds to a simple model for the mechanism underlying many cancers, in which the gene regulatory network is disrupted due to the aberrant activity of a few speciﬁc genes. We propose to solve this problem using the perturbed-node joint graphical lasso, a convex optimization problem that is based upon the use of a row-column overlap norm penalty. We then solve the convex problem using an alternating directions method of multipliers algorithm. Our proposal is illustrated on synthetic data and on an application to brain cancer gene expression data. 1

4 0.68509471 317 nips-2012-Smooth-projected Neighborhood Pursuit for High-dimensional Nonparanormal Graph Estimation

Author: Tuo Zhao, Kathryn Roeder, Han Liu

Abstract: We introduce a new learning algorithm, named smooth-projected neighborhood pursuit, for estimating high dimensional undirected graphs. In particularly, we focus on the nonparanormal graphical model and provide theoretical guarantees for graph estimation consistency. In addition to new computational and theoretical analysis, we also provide an alternative view to analyze the tradeoff between computational efﬁciency and statistical error under a smoothing optimization framework. Numerical results on both synthetic and real datasets are provided to support our theory. 1

5 0.67618686 346 nips-2012-Topology Constraints in Graphical Models

Author: Marcelo Fiori, Pablo Musé, Guillermo Sapiro

Abstract: Graphical models are a very useful tool to describe and understand natural phenomena, from gene expression to climate change and social interactions. The topological structure of these graphs/networks is a fundamental part of the analysis, and in many cases the main goal of the study. However, little work has been done on incorporating prior topological knowledge onto the estimation of the underlying graphical models from sample data. In this work we propose extensions to the basic joint regression model for network estimation, which explicitly incorporate graph-topological constraints into the corresponding optimization approach. The ﬁrst proposed extension includes an eigenvector centrality constraint, thereby promoting this important prior topological property. The second developed extension promotes the formation of certain motifs, triangle-shaped ones in particular, which are known to exist for example in genetic regulatory networks. The presentation of the underlying formulations, which serve as examples of the introduction of topological constraints in network estimation, is complemented with examples in diverse datasets demonstrating the importance of incorporating such critical prior knowledge. 1

6 0.63133687 180 nips-2012-Learning Mixtures of Tree Graphical Models

7 0.62303919 172 nips-2012-Latent Graphical Model Selection: Efficient Methods for Locally Tree-like Graphs

8 0.60305619 213 nips-2012-Minimization of Continuous Bethe Approximations: A Positive Variation

9 0.58565444 164 nips-2012-Iterative Thresholding Algorithm for Sparse Inverse Covariance Estimation

10 0.566006 6 nips-2012-A Convex Formulation for Learning Scale-Free Networks via Submodular Relaxation

11 0.56485546 312 nips-2012-Simultaneously Leveraging Output and Task Structures for Multiple-Output Regression

12 0.55948633 132 nips-2012-Fiedler Random Fields: A Large-Scale Spectral Approach to Statistical Network Modeling

13 0.54710811 331 nips-2012-Symbolic Dynamic Programming for Continuous State and Observation POMDPs

14 0.54680228 7 nips-2012-A Divide-and-Conquer Method for Sparse Inverse Covariance Estimation

15 0.54249012 254 nips-2012-On the Sample Complexity of Robust PCA

16 0.53516304 96 nips-2012-Density Propagation and Improved Bounds on the Partition Function

17 0.52949625 192 nips-2012-Learning the Dependency Structure of Latent Factors

18 0.52593571 335 nips-2012-The Bethe Partition Function of Log-supermodular Graphical Models

19 0.51078242 211 nips-2012-Meta-Gaussian Information Bottleneck

20 0.50519133 352 nips-2012-Transelliptical Graphical Models

similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(0, 0.049), (21, 0.026), (36, 0.187), (38, 0.144), (39, 0.017), (42, 0.039), (54, 0.016), (55, 0.031), (74, 0.084), (76, 0.201), (80, 0.087), (92, 0.04)]

similar papers list:

simIndex simValue paperId paperTitle

1 0.97447222 35 nips-2012-Adaptive Learning of Smoothing Functions: Application to Electricity Load Forecasting

Author: Amadou Ba, Mathieu Sinn, Yannig Goude, Pascal Pompey

Abstract: This paper proposes an efﬁcient online learning algorithm to track the smoothing functions of Additive Models. The key idea is to combine the linear representation of Additive Models with a Recursive Least Squares (RLS) ﬁlter. In order to quickly track changes in the model and put more weight on recent data, the RLS ﬁlter uses a forgetting factor which exponentially weights down observations by the order of their arrival. The tracking behaviour is further enhanced by using an adaptive forgetting factor which is updated based on the gradient of the a priori errors. Using results from Lyapunov stability theory, upper bounds for the learning rate are analyzed. The proposed algorithm is applied to 5 years of electricity load data provided by the French utility company Electricit´ de France (EDF). e Compared to state-of-the-art methods, it achieves a superior performance in terms of model tracking and prediction accuracy. 1

2 0.95057827 288 nips-2012-Rational inference of relative preferences

Author: Nisheeth Srivastava, Paul R. Schrater

Abstract: Statistical decision theory axiomatically assumes that the relative desirability of different options that humans perceive is well described by assigning them optionspeciﬁc scalar utility functions. However, this assumption is refuted by observed human behavior, including studies wherein preferences have been shown to change systematically simply through variation in the set of choice options presented. In this paper, we show that interpreting desirability as a relative comparison between available options at any particular decision instance results in a rational theory of value-inference that explains heretofore intractable violations of rational choice behavior in human subjects. Complementarily, we also characterize the conditions under which a rational agent selecting optimal options indicated by dynamic value inference in our framework will behave identically to one whose preferences are encoded using a static ordinal utility function. 1

3 0.94845337 295 nips-2012-Risk-Aversion in Multi-armed Bandits

Author: Amir Sani, Alessandro Lazaric, Rémi Munos

Abstract: Stochastic multi–armed bandits solve the Exploration–Exploitation dilemma and ultimately maximize the expected reward. Nonetheless, in many practical problems, maximizing the expected reward is not the most desirable objective. In this paper, we introduce a novel setting based on the principle of risk–aversion where the objective is to compete against the arm with the best risk–return trade–off. This setting proves to be more difﬁcult than the standard multi-arm bandit setting due in part to an exploration risk which introduces a regret associated to the variability of an algorithm. Using variance as a measure of risk, we deﬁne two algorithms, investigate their theoretical guarantees, and report preliminary empirical results. 1

same-paper 4 0.89300674 326 nips-2012-Structure estimation for discrete graphical models: Generalized covariance matrices and their inverses

Author: Po-ling Loh, Martin J. Wainwright

5 0.8903926 69 nips-2012-Clustering Sparse Graphs

Author: Yudong Chen, Sujay Sanghavi, Huan Xu

Abstract: We develop a new algorithm to cluster sparse unweighted graphs – i.e. partition the nodes into disjoint clusters so that there is higher density within clusters, and low across clusters. By sparsity we mean the setting where both the in-cluster and across cluster edge densities are very small, possibly vanishing in the size of the graph. Sparsity makes the problem noisier, and hence more difﬁcult to solve. Any clustering involves a tradeoff between minimizing two kinds of errors: missing edges within clusters and present edges across clusters. Our insight is that in the sparse case, these must be penalized differently. We analyze our algorithm’s performance on the natural, classical and widely studied “planted partition” model (also called the stochastic block model); we show that our algorithm can cluster sparser graphs, and with smaller clusters, than all previous methods. This is seen empirically as well. 1

6 0.88408047 61 nips-2012-Best Arm Identification: A Unified Approach to Fixed Budget and Fixed Confidence

7 0.83181798 172 nips-2012-Latent Graphical Model Selection: Efficient Methods for Locally Tree-like Graphs

8 0.82678217 149 nips-2012-Hierarchical Optimistic Region Selection driven by Curiosity

9 0.82230425 188 nips-2012-Learning from Distributions via Support Measure Machines

10 0.82220757 111 nips-2012-Efficient Sampling for Bipartite Matching Problems

11 0.82130748 209 nips-2012-Max-Margin Structured Output Regression for Spatio-Temporal Action Localization

12 0.82029295 294 nips-2012-Repulsive Mixtures

13 0.8192727 316 nips-2012-Small-Variance Asymptotics for Exponential Family Dirichlet Process Mixture Models

14 0.81904966 68 nips-2012-Clustering Aggregation as Maximum-Weight Independent Set

15 0.81855971 210 nips-2012-Memorability of Image Regions

16 0.81801116 354 nips-2012-Truly Nonparametric Online Variational Inference for Hierarchical Dirichlet Processes

17 0.81787407 261 nips-2012-Online allocation and homogeneous partitioning for piecewise constant mean-approximation

18 0.81706005 277 nips-2012-Probabilistic Low-Rank Subspace Clustering

19 0.81582493 180 nips-2012-Learning Mixtures of Tree Graphical Models

20 0.81553537 264 nips-2012-Optimal kernel choice for large-scale two-sample tests