nips nips2012 nips2012-147 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Eunho Yang, Genevera Allen, Zhandong Liu, Pradeep K. Ravikumar
Abstract: Undirected graphical models, also known as Markov networks, enjoy popularity in a variety of applications. The popular instances of these models such as Gaussian Markov Random Fields (GMRFs), Ising models, and multinomial discrete models, however do not capture the characteristics of data in many settings. We introduce a new class of graphical models based on generalized linear models (GLMs) by assuming that node-wise conditional distributions arise from exponential families. Our models allow one to estimate multivariate Markov networks given any univariate exponential distribution, such as Poisson, negative binomial, and exponential, by fitting penalized GLMs to select the neighborhood for each node. A major contribution of this paper is the rigorous statistical analysis showing that with high probability, the neighborhood of our graphical models can be recovered exactly. We also provide examples of non-Gaussian high-throughput genomic networks learned via our GLM graphical models. 1
Reference: text
sentIndex sentText sentNum sentScore
1 edu Abstract Undirected graphical models, also known as Markov networks, enjoy popularity in a variety of applications. [sent-8, score-0.274]
2 We introduce a new class of graphical models based on generalized linear models (GLMs) by assuming that node-wise conditional distributions arise from exponential families. [sent-10, score-0.681]
3 Our models allow one to estimate multivariate Markov networks given any univariate exponential distribution, such as Poisson, negative binomial, and exponential, by fitting penalized GLMs to select the neighborhood for each node. [sent-11, score-0.502]
4 A major contribution of this paper is the rigorous statistical analysis showing that with high probability, the neighborhood of our graphical models can be recovered exactly. [sent-12, score-0.453]
5 We also provide examples of non-Gaussian high-throughput genomic networks learned via our GLM graphical models. [sent-13, score-0.572]
6 1 Introduction Undirected graphical models, also known as Markov random fields, are an important class of statistical models that have been extensively used in a wide variety of domains, including statistical physics, natural language processing, image analysis, and medicine. [sent-14, score-0.421]
7 The key idea in this class of models is to represent the joint distribution as a product of clique-wise compatibility functions; given an underlying graph, each of these compatibility functions depends only on a subset of variables within any clique of the underlying graph. [sent-15, score-0.234]
8 Such a factored graphical model distribution can also be related to an exponential family distribution [1], where the unnormalized probability is expressed as the exponential of a weighted linear combination of clique-wise sufficient statistics. [sent-16, score-0.69]
9 Learning a graphical model distribution from data within this exponential family framework can be reduced to learning weights on these sufficient statistics. [sent-17, score-0.529]
10 The multivariate normal distribution imposed by the GMRF, however, is a stringent assumption; the marginal distribution of any variable must also be Gaussian. [sent-22, score-0.152]
11 In this paper, we propose a general class of graphical models beyond the Ising model and the GMRF to encompass variables arising from all exponential family distributions. [sent-23, score-0.643]
12 1 The key idea in these recent methods is to learn the MRF graph structure by estimating nodeneighborhoods, which are estimated by maximizing the likelihood of each node conditioned on the rest of the nodes. [sent-25, score-0.136]
13 Here, we study the general class of models obtained by the following construction: suppose the node-conditional distributions of each node conditioned on the rest of the nodes are Generalized Linear Models (GLMs) [5]. [sent-27, score-0.304]
14 By the Hammersley-Clifford Theorem [6] and some algebra as derived in [7], these node-conditional distributions entail a global distribution that factors according to cliques defined by the graph obtained from the node-neighborhoods. [sent-28, score-0.261]
15 The resulting class of MRFs broadens the class of models available off-the-shelf, from the standard Ising, indicator-discrete, and Gaussian MRFs. [sent-30, score-0.138]
16 Beyond our initial motivation of finding more general graphical model sufficient statistics, a broader class of parametric graphical models are important for a number of reasons. [sent-31, score-0.676]
17 First, our models provide a principled approach to model multivariate distributions and network structures among a large number of variables. [sent-32, score-0.226]
18 For many non-Gaussian exponential families, multivariate distributions typically do not exist in an analytical or computationally tractable form. [sent-33, score-0.251]
19 Graphical model GLMs provide a way to “extend” univariate exponential families of distributions to the multivariate case and model and study relationships between variables for these families of distributions. [sent-34, score-0.453]
20 Second, while some have proposed to extend the GMRF to a non-parametric class of graphical models by first Gaussianizing the data and then fitting a GMRF over the transformed variables [8], the sample complexity of such non-parametric methods is often inferior to parametric methods. [sent-35, score-0.402]
21 Thus for modeling data that closely follows a non-Gaussian distribution, statistical power for network recovery can be gained by directly fitting parametric GLM graphical models. [sent-36, score-0.466]
22 Third, and specifically for multivariate count data, others have suggested combinatorial approaches to fitting graphical models, mostly in the context of contingency tables [6, 9, 1, 10]. [sent-37, score-0.374]
23 Finally, potential applications for our GLM graphical models abound. [sent-39, score-0.326]
24 Networks of call-times, time spent on websites, diffusion processes, and life-cycles can be modeled with exponential graphical models; other skewed multivariate data can be modeled with gamma or chi-squared graphical models. [sent-40, score-0.744]
25 Perhaps the most interesting motivating applications are for multivariate count data such as from website visits, user-ratings, crime and disease incident reports, bibliometrics, and next-generation genomic sequencing technologies. [sent-41, score-0.42]
26 As Gaussian graphical models are widely used to infer genomic regulatory networks from microarray data, Poisson and negative binomial graphical models may be important for inferring genomic networks from the multivariate count data arising from this emerging technology. [sent-43, score-1.481]
27 Beyond next generation sequencing, there has been a recent proliferation of new high-throughput genomic technologies that produce non-Gaussian data. [sent-44, score-0.252]
28 Thus, our more general class of GLM graphical models can be used for inferring genomic networks from these new high-throughput technologies. [sent-45, score-0.639]
29 The construction of our GLM graphical models also suggests a natural method for learning such models: node-wise neighborhood estimation by fitting sparsity constrained GLMs. [sent-46, score-0.454]
30 A main contribution of this paper is to provide a sparsistency analysis for the recovery of the underlying graph structure of this new class of MRFs. [sent-47, score-0.254]
31 We note that this analysis might be of independent interest even outside the context of modeling and recovering graphical models. [sent-51, score-0.274]
32 In recent years, there has been a trend towards unified statistical analyses that provide statistical guarantees for broad classes of models via general theorems [12]. [sent-52, score-0.133]
33 Our result is in this vein and provides structure recovery for the class of sparsity constrained generalized linear models. [sent-53, score-0.189]
34 Suppose G = (V, E) is an undirected graph over p nodes corresponding to the p variables; the corresponding graphical model is a set of distributions that satisfy Markov independence assumptions with respect to the graph. [sent-60, score-0.542]
35 By the Hammersley-Clifford theorem, any such distribution also factors according to the graph in the following way. [sent-61, score-0.148]
36 Let C be a set of cliques (fully-connected subgraphs) of the graph G, and let {φc (Xc ) c ∈ C} be a set of clique-wise sufficient statistics. [sent-62, score-0.135]
37 With this notation, any distribution of X within the graphical model family represented by the graph G takes the form: P (X) ∝ exp θc φc (Xc ) , (1) c∈C where {θc } are weights over the sufficient statistics. [sent-63, score-0.529]
38 With a pairwise graphical model distribution, the set of cliques consists of the set of nodes V and the set of edges E, so that P (X) ∝ exp θs φs (Xs ) + s∈V θst φst (Xs , Xt ) . [sent-64, score-0.415]
39 (2) (s,t)∈E As previously discussed, an important question is how to select the class of sufficient statistics, φ, in particular to obtain as a multivariate extension of specified univariate parametric distributions? [sent-65, score-0.232]
40 We next outline a subclass of graphical models where the node-conditional distributions are exponential family distributions, with an important special case where these node-conditional distributions are generalized linear models (GLMs). [sent-66, score-0.735]
41 Then, in Section 3, we will study how to learn the underlying graph structure, or infer the edge set E, providing an M-estimator and sufficient conditions under which the estimator recovers the graph structure with high probability. [sent-67, score-0.194]
42 In this section, we investigate the class of models that arise from specifying the node-conditional distributions as exponential families. [sent-69, score-0.272]
43 Specifically, suppose we are given a univariate exponential family distribution, P (Z) = exp(θ B(Z) + C(Z) − D(θ)), with sufficient statistics B(Z), base measure C(Z), and D(θ) as the log-normalization constant. [sent-70, score-0.355]
44 , Xp ) be a p-dimensional random vector; and let G = (V, E) be an undirected graph over p nodes corresponding to the p variables. [sent-74, score-0.157]
45 Now suppose the distribution of Xs given the rest of nodes XV \s is given by the above exponential family, but with the canonical exponential family parameter set to a linear combination of k-th order products of univariate functions {B(Xt )}t∈N (s) . [sent-75, score-0.554]
46 By the Hammersley-Clifford theorem, and some elementary calculation, this conditional distribution can be shown to specify the following unique joint distribution P (X1 , . [sent-83, score-0.172]
47 3 s (4) An important question is whether the conditional and joint distributions specified above have the most general form, under just the assumption of exponential family node-conditional distributions? [sent-101, score-0.365]
48 In particular, note that the canonical parameter in the previous proposition is a tensor factorization of the univariate sufficient statistic, with pair-wise and higher-order interactions, which seems a bit stringent. [sent-102, score-0.125]
49 Further, suppose the corresponding joint distribution factors according to the graph G = (V, E), with the factors over cliques of size at most k. [sent-109, score-0.337]
50 Then, the conditional distribution in (5) has the tensor-factorized form in (3), and the corresponding joint distribution has the form in (4). [sent-110, score-0.172]
51 The conditional distribution then is given by: ¯ P (Xs |XV \s ) = exp θs B(Xs ) + θst B(Xs )B(Xt ) + C(Xs ) − D(XV \s ) , (6) t∈N (s) while the joint distribution is given as P (X) = exp θs B(Xs ) + s θst B(Xs )B(Xt ) + s (s,t)∈E C(Xs ) − A(θ) . [sent-113, score-0.262]
52 (9) In the subsequent sections, we will refer to the entire class of models in (7) as GLM graphical models, but focus on the case (9) with linear functions B(Xs ) = Xs . [sent-115, score-0.369]
53 The GLM graphical models provide multivariate or Markov network extensions of univariate exponential family distributions. [sent-117, score-0.743]
54 The popular Gaussian graphical model and Ising model can thus also be represented by (7). [sent-118, score-0.274]
55 The form of the multinomial graphical model, an extension of the Ising model, can also be represented by (7) and has been previously studied in [4] and others. [sent-120, score-0.342]
56 It is instructive to consider the domain of the set of all possible valid parameters in the GLM graphical model (9); namely those that ensure that the density is normalizable, or equivalently, so that the log-partition function satisfies A(θ) < +∞. [sent-121, score-0.274]
57 For other exponential families, with countable discrete or continuous valued variables, the GLM graphical model does impose additional constraints on valid parameters. [sent-123, score-0.396]
58 The Poisson family has sufficient statistic B(X) = X and base measure C(X) = −log(X! [sent-125, score-0.132]
59 Thus, the Poisson graphical model can only capture negative conditional relationships between variables. [sent-128, score-0.353]
60 Consider the exponential distribution with sufficient statistic B(X) = −X, base measure C(X) = 0. [sent-129, score-0.199]
61 Similar constraints on the parameter space are necessary to ensure proper density functions for several other exponential family graphical models as well. [sent-131, score-0.542]
62 3 Statistical Guarantees In this section, we study the problem of learning the graph structure of an underlying GLM graphical n model given iid samples. [sent-132, score-0.351]
63 Specifically, we assume that we are given n samples X1 = {X (i) }n , i=1 from a GLM graphical model: ∗ θst Xs Xt + C(Xs ) − A(θ) . [sent-133, score-0.274]
64 The goal in graphical model structure recovery is to recover the edges E ∗ of the underlying graph G = (V, E ∗ ). [sent-135, score-0.439]
65 Following [3, 4], we will approach this problem via neighborhood estimation, where we estimate the neighborhood of each node individually, and then stitch these together to form the global graph estimate. [sent-136, score-0.338]
66 Specifically, if we have an estimate N (s) for the true neighborhood N ∗ (s), then we can estimate the overall graph structure as: E = ∪s∈V ∪t∈N (s) {(s, t)}. [sent-137, score-0.178]
67 (11) In order to estimate the neighborhood of any node, we consider the sparsity constrained conditional MLE. [sent-138, score-0.18]
68 Given the joint distribution in (10), the conditional distribution of Xs given the rest of the nodes is given by: ∗ ∗ P (Xs |XV \s ) = exp Xs θst Xt + C(Xs ) − D θst Xt . [sent-139, score-0.255]
69 In the rest of the section, we first discuss the assumptions we impose on the GLM graphical model parameters. [sent-145, score-0.33]
70 The first set of assumptions are standard irrepresentable-type conditions imposed for structure recovery in high-dimensional statistical estimators, and in particular, our assumptions mirror those in [3]. [sent-146, score-0.226]
71 The second set of assumptions are key to our generalized analysis of the class of GLM graphical models as a whole. [sent-147, score-0.456]
72 We also use S = {(s, t) : t ∈ N (s)} s to denote the true neighborhood of node s, and S c to denote its complement. [sent-151, score-0.16]
73 The log-partition function D(·) of the node-conditional distribution (12) satisfies: There exist constants κ1 and κ2 (that depend on the exponential family) s. [sent-168, score-0.161]
74 In particular, we can show that the statements of the following propositions hold, which show that the random vectors X following the GLM graphical model in (10) are suitably well-behaved: Proposition 3. [sent-172, score-0.274]
75 Consider a GLM graphical model distribution as specified in (10), with true parameter ∗ θ∗ and associated edge set E ∗ that satisfies Assumptions 1-5. [sent-179, score-0.353]
76 n (Left), and the probability of successful edge recovery vs. [sent-197, score-0.156]
77 Note that if the neighborhood of each node is recovered with high probability, then by a simple union bound, the estimate in (11), E = ∪s∈V ∪t∈N (s) {(s, t)} is equal to the true edge set E ∗ with high-probability. [sent-199, score-0.2]
78 On the other hand, for the binomial, multinomial or log p Gaussian cases studied in [2, 3, 4], we can recover their results with κ2 = 0 since the log-partition function D(·) of these families are upper bounded by some constant for any input. [sent-203, score-0.154]
79 The left panel of Figure 1 shows the probability of n successful edge recovery for different numbers of nodes, p = {64, 100, 169, 225}. [sent-209, score-0.185]
80 Gaussian graphical models learned from microarray data have often been used to study high-throughput genomic regulatory networks. [sent-214, score-0.656]
81 Our GLM graphical models will be important for understanding genomic networks learned from other high-throughput technologies that do not produce approximately Gaussian data. [sent-215, score-0.65]
82 Level III data, breast cancer miRNA expression (next generation sequencing) [13] and copy number variation (aCGH) Glioblastoma data [14], was obtained from the the Cancer Genome Atlas (TCGA) data portal (http://tcga-data. [sent-217, score-0.301]
83 A Poisson graphical model and a multinomial graphical model were fit to the processed miRNA data and aberration data respectively by performing neighborhood selection with the sparsity of the graph determined by stability selection [15]. [sent-222, score-0.914]
84 Our GLM graphical models, Figure 2, reveal results consistent with the cancer genomics literature. [sent-223, score-0.418]
85 The meta-miRNA inhibitory network has three major hubs, two of which, mir-519 and mir-520, are known to be breast cancer tumor suppressors [16, 17]. [sent-224, score-0.487]
86 27 98 96 86 77 65 36 54 67 78 72 69 68 85 9 52 8 88 7 66 79 56 45 in our network, sharing edges with the five largest hubs; this suggests that our model has learned relevant negative associations between tumor suppressors and enhancers. [sent-226, score-0.207]
87 The Glioblastoma copy number aberration network reveals five major modules, color coded on the left panel in Figure 2, and three of these modules have been previously implicated in Glioblastoma: EGFR in the yellow module, PTEN in the purple module, and CDK2A in the blue module [19]. [sent-227, score-0.261]
88 5 Discussion We have introduced a new class of graphical models that arise when we assume that node-wise conditional distributions follow an exponential family distribution. [sent-228, score-0.692]
89 We have also provided simple M-estimators for learning the network by fitting node-wise penalized GLMs that enjoy strong statistical recovery properties. [sent-229, score-0.159]
90 Our work has broadened the class of off-the-shelf graphical models to encompass a wide range of parametric distributions. [sent-230, score-0.433]
91 These classes of graphical models may be of further interest to the statistical community as they provide closed form multivariate densities for several exponential family distributions (e. [sent-231, score-0.697]
92 Our work outlines the general class of graphical models for exponential family distributions, but there are many avenues for future work in studying this model for specific distributional families. [sent-235, score-0.585]
93 A question remains, can these restrictions be relaxed for specific exponential family distributions? [sent-237, score-0.216]
94 Gaussian, Bernoulli, Poisson, exponential, negative binomial); our models can be studied with non-linear sufficient statistics or multi-parameter distributions as well. [sent-240, score-0.134]
95 High-dimensional ising model selection using 1 regularized logistic regression. [sent-272, score-0.131]
96 Comprehensive genomic characterization defines human glioblastoma genes and core pathways. [sent-344, score-0.369]
97 Stability approach to regularization selection (stars) for high dimensional graphical models. [sent-350, score-0.274]
98 Microrna-520/373 family functions as a tumor suppressor u in estrogen receptor negative breast cancer by targeting nf-κb and tgf-β signaling pathways. [sent-379, score-0.476]
99 let-7 regulates self renewal and tumorigenicity of breast cancer cells. [sent-392, score-0.249]
100 Comprehensive genomic characterization defines human glioblastoma genes and core pathways. [sent-409, score-0.369]
wordName wordTfidf (topN-words)
[('xs', 0.5), ('glm', 0.339), ('graphical', 0.274), ('xv', 0.248), ('genomic', 0.226), ('cancer', 0.144), ('glioblastoma', 0.143), ('glms', 0.143), ('st', 0.139), ('ising', 0.131), ('exponential', 0.122), ('poisson', 0.113), ('tumor', 0.106), ('breast', 0.105), ('neighborhood', 0.101), ('sequencing', 0.094), ('family', 0.094), ('aberration', 0.093), ('recovery', 0.088), ('univariate', 0.082), ('graph', 0.077), ('xp', 0.077), ('gmrf', 0.076), ('multivariate', 0.074), ('mirna', 0.07), ('multinomial', 0.068), ('xt', 0.068), ('families', 0.06), ('genome', 0.06), ('binomial', 0.059), ('node', 0.059), ('cliques', 0.058), ('suppose', 0.057), ('assumptions', 0.056), ('distributions', 0.055), ('copy', 0.052), ('conditional', 0.052), ('models', 0.052), ('atlas', 0.051), ('acgh', 0.046), ('baylor', 0.046), ('gmrfs', 0.046), ('nodeconditional', 0.046), ('sparsistency', 0.046), ('suppressors', 0.046), ('xtj', 0.046), ('zhandong', 0.046), ('tting', 0.046), ('network', 0.045), ('exp', 0.045), ('networks', 0.044), ('microarray', 0.044), ('class', 0.043), ('proposition', 0.043), ('module', 0.042), ('subtle', 0.042), ('joint', 0.042), ('undirected', 0.042), ('inhibitory', 0.041), ('ss', 0.041), ('ravikumar', 0.041), ('eunho', 0.041), ('biomedicine', 0.041), ('edge', 0.04), ('distribution', 0.039), ('statistic', 0.038), ('nodes', 0.038), ('hubs', 0.038), ('rp', 0.037), ('comprehensive', 0.036), ('pan', 0.035), ('suf', 0.035), ('allen', 0.034), ('parametric', 0.033), ('regulatory', 0.032), ('factors', 0.032), ('encompass', 0.031), ('generalized', 0.031), ('compatibility', 0.029), ('lattice', 0.029), ('rice', 0.029), ('incoherence', 0.029), ('analyses', 0.029), ('panel', 0.029), ('seed', 0.028), ('learned', 0.028), ('successful', 0.028), ('wainwright', 0.028), ('xc', 0.028), ('markov', 0.027), ('sparsity', 0.027), ('arising', 0.027), ('mrfs', 0.027), ('negative', 0.027), ('statistical', 0.026), ('maxt', 0.026), ('log', 0.026), ('austin', 0.026), ('technologies', 0.026), ('count', 0.026)]
simIndex simValue paperId paperTitle
same-paper 1 1.0000011 147 nips-2012-Graphical Models via Generalized Linear Models
Author: Eunho Yang, Genevera Allen, Zhandong Liu, Pradeep K. Ravikumar
Abstract: Undirected graphical models, also known as Markov networks, enjoy popularity in a variety of applications. The popular instances of these models such as Gaussian Markov Random Fields (GMRFs), Ising models, and multinomial discrete models, however do not capture the characteristics of data in many settings. We introduce a new class of graphical models based on generalized linear models (GLMs) by assuming that node-wise conditional distributions arise from exponential families. Our models allow one to estimate multivariate Markov networks given any univariate exponential distribution, such as Poisson, negative binomial, and exponential, by fitting penalized GLMs to select the neighborhood for each node. A major contribution of this paper is the rigorous statistical analysis showing that with high probability, the neighborhood of our graphical models can be recovered exactly. We also provide examples of non-Gaussian high-throughput genomic networks learned via our GLM graphical models. 1
2 0.31465679 326 nips-2012-Structure estimation for discrete graphical models: Generalized covariance matrices and their inverses
Author: Po-ling Loh, Martin J. Wainwright
Abstract: We investigate a curious relationship between the structure of a discrete graphical model and the support of the inverse of a generalized covariance matrix. We show that for certain graph structures, the support of the inverse covariance matrix of indicator variables on the vertices of a graph reflects the conditional independence structure of the graph. Our work extends results that have previously been established only in the context of multivariate Gaussian graphical models, thereby addressing an open question about the significance of the inverse covariance matrix of a non-Gaussian distribution. Based on our population-level results, we show how the graphical Lasso may be used to recover the edge structure of certain classes of discrete graphical models, and present simulations to verify our theoretical results. 1
3 0.27807015 331 nips-2012-Symbolic Dynamic Programming for Continuous State and Observation POMDPs
Author: Zahra Zamani, Scott Sanner, Pascal Poupart, Kristian Kersting
Abstract: Point-based value iteration (PBVI) methods have proven extremely effective for finding (approximately) optimal dynamic programming solutions to partiallyobservable Markov decision processes (POMDPs) when a set of initial belief states is known. However, no PBVI work has provided exact point-based backups for both continuous state and observation spaces, which we tackle in this paper. Our key insight is that while there may be an infinite number of observations, there are only a finite number of continuous observation partitionings that are relevant for optimal decision-making when a finite, fixed set of reachable belief states is considered. To this end, we make two important contributions: (1) we show how previous exact symbolic dynamic programming solutions for continuous state MDPs can be generalized to continuous state POMDPs with discrete observations, and (2) we show how recently developed symbolic integration methods allow this solution to be extended to PBVI for continuous state and observation POMDPs with potentially correlated, multivariate continuous observation spaces. 1
4 0.27166921 213 nips-2012-Minimization of Continuous Bethe Approximations: A Positive Variation
Author: Jason Pacheco, Erik B. Sudderth
Abstract: We develop convergent minimization algorithms for Bethe variational approximations which explicitly constrain marginal estimates to families of valid distributions. While existing message passing algorithms define fixed point iterations corresponding to stationary points of the Bethe free energy, their greedy dynamics do not distinguish between local minima and maxima, and can fail to converge. For continuous estimation problems, this instability is linked to the creation of invalid marginal estimates, such as Gaussians with negative variance. Conversely, our approach leverages multiplier methods with well-understood convergence properties, and uses bound projection methods to ensure that marginal approximations are valid at all iterations. We derive general algorithms for discrete and Gaussian pairwise Markov random fields, showing improvements over standard loopy belief propagation. We also apply our method to a hybrid model with both discrete and continuous variables, showing improvements over expectation propagation. 1
5 0.13264829 172 nips-2012-Latent Graphical Model Selection: Efficient Methods for Locally Tree-like Graphs
Author: Anima Anandkumar, Ragupathyraj Valluvan
Abstract: Graphical model selection refers to the problem of estimating the unknown graph structure given observations at the nodes in the model. We consider a challenging instance of this problem when some of the nodes are latent or hidden. We characterize conditions for tractable graph estimation and develop efficient methods with provable guarantees. We consider the class of Ising models Markov on locally tree-like graphs, which are in the regime of correlation decay. We propose an efficient method for graph estimation, and establish its structural consistency −δη(η+1)−2 when the number of samples n scales as n = Ω(θmin log p), where θmin is the minimum edge potential, δ is the depth (i.e., distance from a hidden node to the nearest observed nodes), and η is a parameter which depends on the minimum and maximum node and edge potentials in the Ising model. The proposed method is practical to implement and provides flexibility to control the number of latent variables and the cycle lengths in the output graph. We also present necessary conditions for graph estimation by any method and show that our method nearly matches the lower bound on sample requirements. Keywords: Graphical model selection, latent variables, quartet methods, locally tree-like graphs. 1
6 0.12638523 327 nips-2012-Structured Learning of Gaussian Graphical Models
7 0.1197231 19 nips-2012-A Spectral Algorithm for Latent Dirichlet Allocation
8 0.11828022 180 nips-2012-Learning Mixtures of Tree Graphical Models
9 0.11413378 96 nips-2012-Density Propagation and Improved Bounds on the Partition Function
10 0.1092509 317 nips-2012-Smooth-projected Neighborhood Pursuit for High-dimensional Nonparanormal Graph Estimation
11 0.094676815 138 nips-2012-Fully Bayesian inference for neural models with negative-binomial spiking
12 0.090698257 335 nips-2012-The Bethe Partition Function of Log-supermodular Graphical Models
13 0.089442387 33 nips-2012-Active Learning of Model Evidence Using Bayesian Quadrature
14 0.087642647 186 nips-2012-Learning as MAP Inference in Discrete Graphical Models
15 0.085643426 352 nips-2012-Transelliptical Graphical Models
16 0.08047878 82 nips-2012-Continuous Relaxations for Discrete Hamiltonian Monte Carlo
17 0.076105036 6 nips-2012-A Convex Formulation for Learning Scale-Free Networks via Submodular Relaxation
18 0.074691169 149 nips-2012-Hierarchical Optimistic Region Selection driven by Curiosity
19 0.07353504 346 nips-2012-Topology Constraints in Graphical Models
20 0.07336957 218 nips-2012-Mixing Properties of Conditional Markov Chains with Unbounded Feature Functions
topicId topicWeight
[(0, 0.204), (1, 0.047), (2, 0.101), (3, -0.016), (4, -0.107), (5, 0.008), (6, -0.013), (7, -0.213), (8, -0.195), (9, 0.027), (10, -0.032), (11, -0.031), (12, 0.06), (13, 0.083), (14, -0.079), (15, -0.073), (16, 0.135), (17, -0.026), (18, -0.145), (19, 0.166), (20, 0.077), (21, -0.11), (22, -0.221), (23, -0.096), (24, 0.074), (25, 0.072), (26, 0.199), (27, 0.047), (28, -0.095), (29, -0.021), (30, -0.077), (31, 0.036), (32, 0.083), (33, -0.033), (34, 0.083), (35, 0.135), (36, 0.081), (37, 0.122), (38, 0.046), (39, -0.041), (40, 0.059), (41, -0.008), (42, 0.046), (43, 0.084), (44, -0.031), (45, -0.116), (46, 0.049), (47, 0.012), (48, 0.082), (49, -0.009)]
simIndex simValue paperId paperTitle
same-paper 1 0.94454932 147 nips-2012-Graphical Models via Generalized Linear Models
Author: Eunho Yang, Genevera Allen, Zhandong Liu, Pradeep K. Ravikumar
Abstract: Undirected graphical models, also known as Markov networks, enjoy popularity in a variety of applications. The popular instances of these models such as Gaussian Markov Random Fields (GMRFs), Ising models, and multinomial discrete models, however do not capture the characteristics of data in many settings. We introduce a new class of graphical models based on generalized linear models (GLMs) by assuming that node-wise conditional distributions arise from exponential families. Our models allow one to estimate multivariate Markov networks given any univariate exponential distribution, such as Poisson, negative binomial, and exponential, by fitting penalized GLMs to select the neighborhood for each node. A major contribution of this paper is the rigorous statistical analysis showing that with high probability, the neighborhood of our graphical models can be recovered exactly. We also provide examples of non-Gaussian high-throughput genomic networks learned via our GLM graphical models. 1
2 0.79783446 331 nips-2012-Symbolic Dynamic Programming for Continuous State and Observation POMDPs
Author: Zahra Zamani, Scott Sanner, Pascal Poupart, Kristian Kersting
Abstract: Point-based value iteration (PBVI) methods have proven extremely effective for finding (approximately) optimal dynamic programming solutions to partiallyobservable Markov decision processes (POMDPs) when a set of initial belief states is known. However, no PBVI work has provided exact point-based backups for both continuous state and observation spaces, which we tackle in this paper. Our key insight is that while there may be an infinite number of observations, there are only a finite number of continuous observation partitionings that are relevant for optimal decision-making when a finite, fixed set of reachable belief states is considered. To this end, we make two important contributions: (1) we show how previous exact symbolic dynamic programming solutions for continuous state MDPs can be generalized to continuous state POMDPs with discrete observations, and (2) we show how recently developed symbolic integration methods allow this solution to be extended to PBVI for continuous state and observation POMDPs with potentially correlated, multivariate continuous observation spaces. 1
3 0.7776379 213 nips-2012-Minimization of Continuous Bethe Approximations: A Positive Variation
Author: Jason Pacheco, Erik B. Sudderth
Abstract: We develop convergent minimization algorithms for Bethe variational approximations which explicitly constrain marginal estimates to families of valid distributions. While existing message passing algorithms define fixed point iterations corresponding to stationary points of the Bethe free energy, their greedy dynamics do not distinguish between local minima and maxima, and can fail to converge. For continuous estimation problems, this instability is linked to the creation of invalid marginal estimates, such as Gaussians with negative variance. Conversely, our approach leverages multiplier methods with well-understood convergence properties, and uses bound projection methods to ensure that marginal approximations are valid at all iterations. We derive general algorithms for discrete and Gaussian pairwise Markov random fields, showing improvements over standard loopy belief propagation. We also apply our method to a hybrid model with both discrete and continuous variables, showing improvements over expectation propagation. 1
4 0.73667932 326 nips-2012-Structure estimation for discrete graphical models: Generalized covariance matrices and their inverses
Author: Po-ling Loh, Martin J. Wainwright
Abstract: We investigate a curious relationship between the structure of a discrete graphical model and the support of the inverse of a generalized covariance matrix. We show that for certain graph structures, the support of the inverse covariance matrix of indicator variables on the vertices of a graph reflects the conditional independence structure of the graph. Our work extends results that have previously been established only in the context of multivariate Gaussian graphical models, thereby addressing an open question about the significance of the inverse covariance matrix of a non-Gaussian distribution. Based on our population-level results, we show how the graphical Lasso may be used to recover the edge structure of certain classes of discrete graphical models, and present simulations to verify our theoretical results. 1
5 0.51123297 327 nips-2012-Structured Learning of Gaussian Graphical Models
Author: Karthik Mohan, Mike Chung, Seungyeop Han, Daniela Witten, Su-in Lee, Maryam Fazel
Abstract: We consider estimation of multiple high-dimensional Gaussian graphical models corresponding to a single set of nodes under several distinct conditions. We assume that most aspects of the networks are shared, but that there are some structured differences between them. Specifically, the network differences are generated from node perturbations: a few nodes are perturbed across networks, and most or all edges stemming from such nodes differ between networks. This corresponds to a simple model for the mechanism underlying many cancers, in which the gene regulatory network is disrupted due to the aberrant activity of a few specific genes. We propose to solve this problem using the perturbed-node joint graphical lasso, a convex optimization problem that is based upon the use of a row-column overlap norm penalty. We then solve the convex problem using an alternating directions method of multipliers algorithm. Our proposal is illustrated on synthetic data and on an application to brain cancer gene expression data. 1
6 0.49778423 317 nips-2012-Smooth-projected Neighborhood Pursuit for High-dimensional Nonparanormal Graph Estimation
7 0.48336983 96 nips-2012-Density Propagation and Improved Bounds on the Partition Function
8 0.47616187 335 nips-2012-The Bethe Partition Function of Log-supermodular Graphical Models
9 0.41537377 180 nips-2012-Learning Mixtures of Tree Graphical Models
10 0.41252235 172 nips-2012-Latent Graphical Model Selection: Efficient Methods for Locally Tree-like Graphs
11 0.4060545 346 nips-2012-Topology Constraints in Graphical Models
12 0.40327269 53 nips-2012-Bayesian Pedigree Analysis using Measure Factorization
13 0.37551591 248 nips-2012-Nonparanormal Belief Propagation (NPNBP)
14 0.36131668 352 nips-2012-Transelliptical Graphical Models
15 0.35750112 82 nips-2012-Continuous Relaxations for Discrete Hamiltonian Monte Carlo
16 0.35078579 186 nips-2012-Learning as MAP Inference in Discrete Graphical Models
17 0.35023531 192 nips-2012-Learning the Dependency Structure of Latent Factors
18 0.33323008 132 nips-2012-Fiedler Random Fields: A Large-Scale Spectral Approach to Statistical Network Modeling
19 0.33140337 211 nips-2012-Meta-Gaussian Information Bottleneck
20 0.32798487 105 nips-2012-Dynamic Pruning of Factor Graphs for Maximum Marginal Prediction
topicId topicWeight
[(0, 0.052), (21, 0.042), (36, 0.014), (38, 0.117), (39, 0.023), (42, 0.028), (54, 0.022), (55, 0.012), (64, 0.223), (74, 0.063), (76, 0.172), (80, 0.106), (92, 0.04)]
simIndex simValue paperId paperTitle
1 0.84868157 284 nips-2012-Q-MKL: Matrix-induced Regularization in Multi-Kernel Learning with Applications to Neuroimaging
Author: Chris Hinrichs, Vikas Singh, Jiming Peng, Sterling Johnson
Abstract: Multiple Kernel Learning (MKL) generalizes SVMs to the setting where one simultaneously trains a linear classifier and chooses an optimal combination of given base kernels. Model complexity is typically controlled using various norm regularizations on the base kernel mixing coefficients. Existing methods neither regularize nor exploit potentially useful information pertaining to how kernels in the input set ‘interact’; that is, higher order kernel-pair relationships that can be easily obtained via unsupervised (similarity, geodesics), supervised (correlation in errors), or domain knowledge driven mechanisms (which features were used to construct the kernel?). We show that by substituting the norm penalty with an arbitrary quadratic function Q 0, one can impose a desired covariance structure on mixing weights, and use this as an inductive bias when learning the concept. This formulation significantly generalizes the widely used 1- and 2-norm MKL objectives. We explore the model’s utility via experiments on a challenging Neuroimaging problem, where the goal is to predict a subject’s conversion to Alzheimer’s Disease (AD) by exploiting aggregate information from many distinct imaging modalities. Here, our new model outperforms the state of the art (p-values 10−3 ). We briefly discuss ramifications in terms of learning bounds (Rademacher complexity). 1
2 0.82931089 108 nips-2012-Efficient Bayes-Adaptive Reinforcement Learning using Sample-Based Search
Author: Arthur Guez, David Silver, Peter Dayan
Abstract: Bayesian model-based reinforcement learning is a formally elegant approach to learning optimal behaviour under model uncertainty, trading off exploration and exploitation in an ideal way. Unfortunately, finding the resulting Bayes-optimal policies is notoriously taxing, since the search space becomes enormous. In this paper we introduce a tractable, sample-based method for approximate Bayesoptimal planning which exploits Monte-Carlo tree search. Our approach outperformed prior Bayesian model-based RL algorithms by a significant margin on several well-known benchmark problems – because it avoids expensive applications of Bayes rule within the search tree by lazily sampling models from the current beliefs. We illustrate the advantages of our approach by showing it working in an infinite state space domain which is qualitatively out of reach of almost all previous work in Bayesian exploration. 1
same-paper 3 0.82011098 147 nips-2012-Graphical Models via Generalized Linear Models
Author: Eunho Yang, Genevera Allen, Zhandong Liu, Pradeep K. Ravikumar
Abstract: Undirected graphical models, also known as Markov networks, enjoy popularity in a variety of applications. The popular instances of these models such as Gaussian Markov Random Fields (GMRFs), Ising models, and multinomial discrete models, however do not capture the characteristics of data in many settings. We introduce a new class of graphical models based on generalized linear models (GLMs) by assuming that node-wise conditional distributions arise from exponential families. Our models allow one to estimate multivariate Markov networks given any univariate exponential distribution, such as Poisson, negative binomial, and exponential, by fitting penalized GLMs to select the neighborhood for each node. A major contribution of this paper is the rigorous statistical analysis showing that with high probability, the neighborhood of our graphical models can be recovered exactly. We also provide examples of non-Gaussian high-throughput genomic networks learned via our GLM graphical models. 1
4 0.78462732 221 nips-2012-Multi-Stage Multi-Task Feature Learning
Author: Pinghua Gong, Jieping Ye, Chang-shui Zhang
Abstract: Multi-task sparse feature learning aims to improve the generalization performance by exploiting the shared features among tasks. It has been successfully applied to many applications including computer vision and biomedical informatics. Most of the existing multi-task sparse feature learning algorithms are formulated as a convex sparse regularization problem, which is usually suboptimal, due to its looseness for approximating an ℓ0 -type regularizer. In this paper, we propose a non-convex formulation for multi-task sparse feature learning based on a novel regularizer. To solve the non-convex optimization problem, we propose a MultiStage Multi-Task Feature Learning (MSMTFL) algorithm. Moreover, we present a detailed theoretical analysis showing that MSMTFL achieves a better parameter estimation error bound than the convex formulation. Empirical studies on both synthetic and real-world data sets demonstrate the effectiveness of MSMTFL in comparison with the state of the art multi-task sparse feature learning algorithms. 1
5 0.78241897 27 nips-2012-A quasi-Newton proximal splitting method
Author: Stephen Becker, Jalal Fadili
Abstract: A new result in convex analysis on the calculation of proximity operators in certain scaled norms is derived. We describe efficient implementations of the proximity calculation for a useful class of functions; the implementations exploit the piece-wise linear nature of the dual problem. The second part of the paper applies the previous result to acceleration of convex minimization problems, and leads to an elegant quasi-Newton method. The optimization method compares favorably against state-of-the-art alternatives. The algorithm has extensive applications including signal processing, sparse recovery and machine learning and classification. 1
6 0.74007064 355 nips-2012-Truncation-free Online Variational Inference for Bayesian Nonparametric Models
7 0.73864758 316 nips-2012-Small-Variance Asymptotics for Exponential Family Dirichlet Process Mixture Models
8 0.73860389 354 nips-2012-Truly Nonparametric Online Variational Inference for Hierarchical Dirichlet Processes
9 0.73630732 277 nips-2012-Probabilistic Low-Rank Subspace Clustering
10 0.73586512 232 nips-2012-Multiplicative Forests for Continuous-Time Processes
11 0.73473984 188 nips-2012-Learning from Distributions via Support Measure Machines
12 0.73458886 197 nips-2012-Learning with Recursive Perceptual Representations
13 0.73424238 172 nips-2012-Latent Graphical Model Selection: Efficient Methods for Locally Tree-like Graphs
14 0.7338863 103 nips-2012-Distributed Probabilistic Learning for Camera Networks with Missing Data
15 0.73386359 294 nips-2012-Repulsive Mixtures
16 0.73367667 104 nips-2012-Dual-Space Analysis of the Sparse Linear Model
17 0.73336947 168 nips-2012-Kernel Latent SVM for Visual Recognition
18 0.73314971 326 nips-2012-Structure estimation for discrete graphical models: Generalized covariance matrices and their inverses
19 0.73308367 279 nips-2012-Projection Retrieval for Classification
20 0.73289579 111 nips-2012-Efficient Sampling for Bipartite Matching Problems