nips nips2013 nips2013-217 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Eunho Yang, Pradeep Ravikumar, Genevera I. Allen, Zhandong Liu
Abstract: Undirected graphical models, such as Gaussian graphical models, Ising, and multinomial/categorical graphical models, are widely used in a variety of applications for modeling distributions over a large number of variables. These standard instances, however, are ill-suited to modeling count data, which are increasingly ubiquitous in big-data settings such as genomic sequencing data, user-ratings data, spatial incidence data, climate studies, and site visits. Existing classes of Poisson graphical models, which arise as the joint distributions that correspond to Poisson distributed node-conditional distributions, have a major drawback: they can only model negative conditional dependencies for reasons of normalizability given its infinite domain. In this paper, our objective is to modify the Poisson graphical model distribution so that it can capture a rich dependence structure between count-valued variables. We begin by discussing two strategies for truncating the Poisson distribution and show that only one of these leads to a valid joint distribution. While this model can accommodate a wider range of conditional dependencies, some limitations still remain. To address this, we investigate two additional novel variants of the Poisson distribution and their corresponding joint graphical model distributions. Our three novel approaches provide classes of Poisson-like graphical models that can capture both positive and negative conditional dependencies between count-valued variables. One can learn the graph structure of our models via penalized neighborhood selection, and we demonstrate the performance of our methods by learning simulated networks as well as a network from microRNA-sequencing data. 1
Reference: text
sentIndex sentText sentNum sentScore
1 edu Abstract Undirected graphical models, such as Gaussian graphical models, Ising, and multinomial/categorical graphical models, are widely used in a variety of applications for modeling distributions over a large number of variables. [sent-8, score-0.698]
2 These standard instances, however, are ill-suited to modeling count data, which are increasingly ubiquitous in big-data settings such as genomic sequencing data, user-ratings data, spatial incidence data, climate studies, and site visits. [sent-9, score-0.27]
3 Existing classes of Poisson graphical models, which arise as the joint distributions that correspond to Poisson distributed node-conditional distributions, have a major drawback: they can only model negative conditional dependencies for reasons of normalizability given its infinite domain. [sent-10, score-0.532]
4 In this paper, our objective is to modify the Poisson graphical model distribution so that it can capture a rich dependence structure between count-valued variables. [sent-11, score-0.248]
5 To address this, we investigate two additional novel variants of the Poisson distribution and their corresponding joint graphical model distributions. [sent-14, score-0.313]
6 Our three novel approaches provide classes of Poisson-like graphical models that can capture both positive and negative conditional dependencies between count-valued variables. [sent-15, score-0.427]
7 One can learn the graph structure of our models via penalized neighborhood selection, and we demonstrate the performance of our methods by learning simulated networks as well as a network from microRNA-sequencing data. [sent-16, score-0.117]
8 1 Introduction Undirected graphical models, or Markov random fields (MRFs), are a popular class of statistical models for representing distributions over a large number of variables. [sent-17, score-0.272]
9 Popular instances of this class of models include Gaussian graphical models [1, 2, 3, 4], used for modeling continuous real-valued data, the Ising model [3, 5], used for modeling binary data, as well as multinomial graphical models [6] where each variable takes values in a small finite set. [sent-19, score-0.547]
10 None of these models however are best suited to model count data, where the variables take values in the set of all positive integers. [sent-21, score-0.164]
11 Examples of such count data are increasingly ubiquitous in big-data 1 settings, including high-throughput genomic sequencing data, spatial incidence data, climate studies, user-ratings data, term-document counts, site visits, and crime and disease incidence reports. [sent-22, score-0.282]
12 In the univariate case, a popular choice for modeling count data is the Poisson distribution. [sent-23, score-0.201]
13 Could we then model complex multivariate count data using some multivariate extension of the Poisson distribution? [sent-24, score-0.242]
14 Yet other approaches are based on indirect copula transforms [12], as well as multivariate Poisson distributions that do not have a closed, tractable form, and relying on limiting results [13]. [sent-26, score-0.184]
15 Another important approach defines a multivariate Poisson distribution by modeling node variables as sums of independent Poisson variables [14, 15]. [sent-27, score-0.115]
16 Other avenues for modeling multivariate count-data include hierarchical models commonly used in spatial statistics [16]. [sent-30, score-0.141]
17 In a qualitatively different line of work, Besag [17] discusses a tractable and natural multivariate extension of the univariate Poisson distribution; while this work focused on the pairwise model case, Yang et al. [sent-31, score-0.214]
18 [18, 19] extended this to the general graphical model setting. [sent-32, score-0.215]
19 Their construction of a Poisson graphical model (PGM) is simple. [sent-33, score-0.244]
20 Suppose all node-conditional distributions, the conditional distribution of a node conditioned on the rest of the nodes, are univariate Poisson. [sent-34, score-0.14]
21 Then, there is a unique joint distribution consistent with these node-conditional distributions, and moreover this joint distribution is a graphical model distribution that factors according to a graph specified by the node-conditional distributions. [sent-35, score-0.425]
22 While this graphical model seems like a good candidate to model multivariate count data, there is one major defect. [sent-36, score-0.398]
23 For the density to be normalizable, the edge weights specifying the Poisson graphical model distribution have to be non-positive. [sent-37, score-0.248]
24 This restriction implies that a Poisson graphical model distribution only models negative dependencies, or so called “competitive” relationships among variables. [sent-38, score-0.311]
25 Thus, such a Poisson graphical model would have limited practical applicability in modeling more general multivariate count data [20, 21], with both positive and negative dependencies among the variables. [sent-39, score-0.537]
26 To address this major drawback of non-positive conditional dependencies of the Poisson MRF, Kaiser and Cressie [20], Griffith [21] have suggested the use of the Winsorized Poisson distribution. [sent-40, score-0.138]
27 This is the univariate distribution obtained by truncating the integer-valued Poisson random variable at a finite constant R. [sent-41, score-0.147]
28 Specifically, they propose the use of this Winsorized Poisson as nodeconditional distributions, and assert that there exists a consistent joint distribution by following the construction of [17]. [sent-42, score-0.133]
29 Thus, there currently does not exist a graphical model distribution for high-dimensional multivariate count data that does not suffer from severe deficiencies. [sent-44, score-0.404]
30 In this paper, our objective is to specify a joint graphical model distribution over the set of non-negative integers that can capture rich dependence structures between variables. [sent-45, score-0.293]
31 This model however, still has certain limitations on the types of variables and dependencies that may be modeled, and we thus consider more fundamental modifications to the univariate Poisson density’s base measure and sufficient statistics. [sent-47, score-0.234]
32 (3) We will show that in order to have both positive and negative conditional dependencies, the requirements of normalizability are that the base measure of the Poisson density needs to scale quadratically for linear sufficient statistics. [sent-48, score-0.28]
33 Our three novel approaches for the first time specify classes of joint graphical models for count data that permit rich dependence structures between variables. [sent-50, score-0.424]
34 While the focus of this paper is model specification, we also illustrate how our models can be used to learn the network structure from iid samples of high-dimensional multivariate count data via neighborhood selection. [sent-51, score-0.232]
35 We conclude our work by demonstrating our models on simulated networks and by learning a breast cancer microRNA expression network form count-valued next generation sequencing data. [sent-52, score-0.263]
36 2 2 Poisson Graphical Models & Truncation Poisson graphical models were introduced by [17] for the pairwise case, where they termed these “Poisson auto-models”; [18, 19] provide a generalization to these models. [sent-53, score-0.283]
37 The pairwise Poisson graphical model (PGM) distribution over X is then defined as (θs Xs − log(Xs ! [sent-61, score-0.291]
38 ) − exp(ηs )}, which is a univariate Poisson distribution with parameter λ = exp(ηs ) = exp(θs + t∈N (s) θst Xt ), and where N (s) is the neighborhood of node s according to graph G. [sent-64, score-0.164]
39 As we have noted, there is a major drawback with this Poisson graphical model distribution. [sent-65, score-0.268]
40 Note that the domain of parameters θ of the distribution in (1) are specified by the normalizability condition A(θ) < +∞, where A(θ) := log X p exp s∈V (θs Xs −log(Xs ! [sent-66, score-0.23]
41 The above proposition asserts that the Poisson graphical model in (1) only allows negative edgeweights, and consequently can only capture negative conditional relationships between variables. [sent-71, score-0.354]
42 Thus, even though the Poisson graphical model is a natural extension of the univariate Poisson distribution, it entails a highly restrictive parameter space with severely limited applicability. [sent-72, score-0.325]
43 The objective of this paper, then, is to arrive at a graphical model for count data that would allow relaxing these restrictive assumptions, and model both positively and negatively correlated variables. [sent-73, score-0.329]
44 In this section, we will investigate the two natural ways in which to do so and discuss their possible graphical model distributions. [sent-77, score-0.215]
45 1 A Natural Truncation Approach Kaiser and Cressie [20] first introduced an approach to truncate the Poisson distribution in the context of graphical models. [sent-80, score-0.268]
46 By the Taylor series expansion of the exponential function, this distribution can be expressed in a form reminiscent of the exponential family, P (Xs |XV \s ) = exp ηs Xs − log(Xs ! [sent-89, score-0.153]
47 We now have the machinery to describe the development in [20] of a Winsorized Poisson graphical model. [sent-93, score-0.215]
48 Specifically, Kaiser and Cressie [20] assert in a Proposition of their paper that there is a valid joint distribution consistent with these Winsorized Poisson node-conditional distributions above. [sent-94, score-0.136]
49 Then there is no joint distribution over X such that the corresponding node-conditional distributions P (Xs |XV \s ), of a node conditioned on the rest of the nodes, have the form specified as P (Xs |XV \s ) ∝ exp E(XV \s )Xs − log(Xs ! [sent-104, score-0.174]
50 Theorem 1 thus shows that we cannot just substitute the Winsorized Poisson distribution in the construction of [17, 18, 19] to obtain a Winsorized variant of Poisson graphical models. [sent-106, score-0.277]
51 2 A New Approach to Truncation It is instructive to study the probability mass function of the univariate Winsorized Poisson distribution in (2). [sent-109, score-0.145]
52 A natural strategy would then be to use this distribution as the node-conditional distributions in the construction of [17, 18]: exp θs + P (Xs |XV \s ) = k∈X exp t∈N (s) θst Xt θs + Xs − log(Xs ! [sent-121, score-0.222]
53 Then, there exists a unique joint distribution that is consistent with these node-conditional distributions, and moreover this distribution belongs to the graphical model represented by G, with the form: P (X) := exp s∈V (θs Xs − log(Xs ! [sent-133, score-0.39]
54 (s,t)∈E We call this distribution the Truncated Poisson graphical model (TPGM) distribution. [sent-135, score-0.248]
55 Unlike the original Poisson graphical model, the TPGM can model both positive and negative dependencies among its variables. [sent-138, score-0.36]
56 There are, however, some drawbacks to this graphical model distribution. [sent-139, score-0.215]
57 Therefore, as the truncation value R increases, the possible values that the parameters θ can take become increasingly negative or close to zero to prevent all random variables from always taking large count values at the same time. [sent-143, score-0.185]
58 3 A New Class of Poisson Variants and Their Graphical Model Distributions As discussed in the previous section, taking a Poisson random variable and truncating it may be a natural approach but does not lead to a valid multivariate graphical model extension, or does so with some caveats. [sent-146, score-0.305]
59 Accordingly in this section, we investigate the possibility of modifying the Poisson distribution more fundamentally, by modifying its sufficient statistic and base measure. [sent-147, score-0.117]
60 4 Let us first briefly review the derivation of a Poisson graphical model as the graphical model extension of a univariate exponential family distribution from [17, 18, 19]. [sent-148, score-0.628]
61 Consider a general univariate exponential family distribution, for a random variable Z: P (Z) = exp(θB(Z) − C(Z) − D(θ)), where B(Z) is the exponential family sufficient statistic, θ ∈ R is the parameter, C(Z) is the base measure, and D(θ) is the log-partition function. [sent-149, score-0.26]
62 Further, suppose the corresponding joint distribution factors according to the graph G, with the factors over cliques of size at most k. [sent-151, score-0.12]
63 With clique factors of size k at most two, this joint distribution takes the following form: P (X) = exp s∈V θs B(Xs ) + (s,t)∈E θst B(Xs )B(Xt ) − s∈V C(Xs ) − A(θ) . [sent-153, score-0.142]
64 Also note that the original Poisson graphical model (1) discussed in Section 2 can be derived from this construction with sufficient statistics B(X) = X, and base measure C(X) = log(X! [sent-155, score-0.33]
65 1 A Quadratic Poisson Graphical Model As noted in Proposition 1, the normalizability of this Poisson graphical model distribution, however, requires that the pairwise parameters be negative. [sent-158, score-0.348]
66 Accordingly, we consider the following general distribution over count-valued variables: P (Z) = exp(θZ − C(Z) − D(θ)), (5) which has the same sufficient statistics as the Poisson, but a more general base measure C(Z), for some function C(·). [sent-161, score-0.119]
67 The following theorem shows that for normalizability of the resulting graphical model distribution with possibly positive edge-parameters, the base measure cannot be sub-quadratic: Theorem 3. [sent-162, score-0.468]
68 , Xp ) is a count-valued random vector, with joint distribution given by the graphical model extension of the univariate distribution in (5) which follows the construction of [17, 18, 19]). [sent-166, score-0.465]
69 The previous theorem thus suggests using the “Gaussian-esque” quadratic base measure C(Z) = Z 2 , so that we would obtain the following distribution over count-valued vectors, P (X) = 2 exp s∈V θs Xs + (s,t)∈E θst Xs Xt − c s∈V Xs − A(θ) . [sent-168, score-0.208]
70 The following proposition shows that the QPGM is normalizable while permitting both positive and negative edge-parameters. [sent-172, score-0.187]
71 Then the distribution is normalizable provided the following condition holds: There exists a positive constant cθ , such that for all X ∈ Wp , X T ΘX ≤ −cθ X 2 . [sent-176, score-0.141]
72 2 The condition in the proposition would be satisfied provided that the pairwise parameters are pointwise negative: Θ < 0, similar to the original Poisson graphical model. [sent-177, score-0.299]
73 Alternatively, it is also sufficient for the pairwise parameter matrix to be negative-definite: Θ 0, which does allow for positive and negative dependencies, as in the Gaussian distribution. [sent-178, score-0.125]
74 A possible drawback with this distribution is that due to the quadratic base measure, the QPGM has a Gaussian-esque thin tail. [sent-179, score-0.149]
75 Thus, nodewise regressions according to the QPGM via the above variational upper bound on the partition function would behave similarly to that of a Gaussian graphical model. [sent-183, score-0.215]
76 Such a quadratic base measure however results in a Gaussian-esque thin tail, while we would like to specify a distribution with possibly heavier tails than those of QPGM. [sent-186, score-0.164]
77 Accordingly, we consider the following univariate distribution over count-valued variables: P (Z) = exp(θB(Z; R0 , R) − log Z! [sent-188, score-0.142]
78 , Xp ) is a count-valued random vector, with joint distribution given by the graphical model extension of the univariate distribution in (7) (following the construction [17, 18, 19]): θs B(Xs ; R0 , R) + P (X) = exp s∈V θst B(Xs ; R0 , R)B(Xt ; R0 , R) − log(Xs ! [sent-195, score-0.529]
79 On comparing with the QPGM, the SPGM has two distinct advantages: (1) it has a heavier tails with milder base measures as seen in its motivation, and (2) allows a broader set of feasible pairwise parameters (actually for all real values) as shown in Theorem 4. [sent-198, score-0.128]
80 Our TPGM and SPGM M -estimators are compared to the graphical lasso [4], the non-paranormal copula-based method [7] and the non-paranormal SKEPTIC estimator [10]. [sent-276, score-0.215]
81 We evaluate the comparative performance of our TPGM and SPGM methods for recovering the true network from multivariate count data. [sent-281, score-0.182]
82 For the former, edges were generated with both positive and negative weights, while for the latter, only edges with positive weights can be generated. [sent-283, score-0.126]
83 Two network structures are considered that are commonly used throughout genomics: the hub and scale-free graph structures. [sent-285, score-0.13]
84 We compare the performance of our TPGM and SPGM methods with R set to the maximum count value to Gaussian graphical models [4], the non-paranormal [7], and the non-paranormal SKEPTIC [10]. [sent-286, score-0.335]
85 This likely results from a facet of its node-conditional distribution which places larger mass on strongly dependent count values that are close to R. [sent-291, score-0.155]
86 Additionally, all methods compared outperform the original Poisson graphical model estimator, given in [18] (results not shown), as this method can only recover edges with negative weights. [sent-293, score-0.253]
87 We demonstrate the advantages of our graphical models for count-valued data by learning a microRNA (miRNA) expression network from next generation sequencing data. [sent-295, score-0.333]
88 This data consists of counts of sequencing reads mapped back to a reference genome and are replacing microarrays, for which GGMs are a popular tool, as the preferred measures of gene expression [22]. [sent-296, score-0.125]
89 The bottom row presents adjacency matrices of inferred networks with that of SPGM occupying the lower triangular portion and that of (left) PGM, (middle) TPGM with R = 11, and graphical lasso (right) occupying the upper triangular portion. [sent-300, score-0.287]
90 Networks were learned from this data using the original Poisson graphical model, Gaussian graphical models, our novel TPGM approach with R = 11, the maximum count, and our novel SPGM approach with R = 11 and R0 = 5. [sent-302, score-0.47]
91 The original Poisson graphical model, on the other hand, misses much of the structure learned by the other methods and instead only finds 14 miRNAs that have major conditionally negative relationships. [sent-307, score-0.28]
92 Compared with Gaussian graphical models, our novel methods for count-valued data find many more edges and biologically important hub miRNAs. [sent-309, score-0.318]
93 Two of these, mir-375 and mir-10b, found by both TPGM and SPGM but not by GGM, are known to be key players in breast cancer [26, 27]. [sent-310, score-0.125]
94 Model selection and estimation in the gaussian graphical model. [sent-332, score-0.237]
95 Copula gaussian graphical models and their application to modeling functional disability data. [sent-371, score-0.283]
96 An elegant method for generating multivariate poisson random variable. [sent-399, score-0.507]
97 An em algorithm for multivariate poisson distribution and related models. [sent-413, score-0.54]
98 A log-linear graphical model for inferring genetic networks from high-throughput sequencing data. [sent-476, score-0.302]
99 Stability approach to regularization selection (stars) for high dimensional graphical models. [sent-482, score-0.215]
100 Epigenetically deregulated microrna-375 is involved in a positive feedback loop with estrogen receptor α in breast cancer cells. [sent-512, score-0.169]
wordName wordTfidf (topN-words)
[('tpgm', 0.453), ('poisson', 0.446), ('xs', 0.365), ('spgm', 0.351), ('graphical', 0.215), ('npn', 0.175), ('winsorized', 0.131), ('xv', 0.121), ('qpgm', 0.117), ('skeptic', 0.117), ('pgm', 0.103), ('count', 0.095), ('copula', 0.091), ('normalizability', 0.09), ('univariate', 0.085), ('hub', 0.083), ('st', 0.074), ('glasso', 0.067), ('sequencing', 0.067), ('base', 0.065), ('normalizable', 0.064), ('exp', 0.064), ('cancer', 0.063), ('dependencies', 0.063), ('breast', 0.062), ('multivariate', 0.061), ('kaiser', 0.058), ('karlis', 0.058), ('xt', 0.054), ('truncated', 0.052), ('truncation', 0.052), ('ss', 0.051), ('joint', 0.045), ('positive', 0.044), ('cressie', 0.044), ('microrna', 0.044), ('mirna', 0.044), ('mirnas', 0.044), ('pairwise', 0.043), ('allen', 0.042), ('proposition', 0.041), ('genome', 0.038), ('negative', 0.038), ('xp', 0.037), ('spatial', 0.034), ('false', 0.034), ('ravikumar', 0.034), ('incidence', 0.033), ('distribution', 0.033), ('distributions', 0.032), ('qqq', 0.032), ('qq', 0.03), ('truncating', 0.029), ('construction', 0.029), ('malekpour', 0.029), ('negativity', 0.029), ('remnant', 0.029), ('exponential', 0.028), ('major', 0.027), ('mass', 0.027), ('family', 0.027), ('drawback', 0.026), ('nonparanormal', 0.026), ('assert', 0.026), ('occupying', 0.026), ('network', 0.026), ('models', 0.025), ('extension', 0.025), ('neighborhood', 0.025), ('quadratic', 0.025), ('permit', 0.024), ('log', 0.024), ('poissons', 0.024), ('ising', 0.024), ('arxiv', 0.024), ('eunho', 0.022), ('gaussian', 0.022), ('conditional', 0.022), ('graph', 0.021), ('measure', 0.021), ('suppose', 0.021), ('modeling', 0.021), ('yuan', 0.021), ('undirected', 0.02), ('climate', 0.02), ('genomics', 0.02), ('heavier', 0.02), ('gene', 0.02), ('liu', 0.02), ('novel', 0.02), ('networks', 0.02), ('lafferty', 0.02), ('rate', 0.02), ('rice', 0.02), ('truncate', 0.02), ('yang', 0.019), ('arrive', 0.019), ('statistic', 0.019), ('domain', 0.019), ('mrf', 0.019)]
simIndex simValue paperId paperTitle
same-paper 1 0.99999976 217 nips-2013-On Poisson Graphical Models
Author: Eunho Yang, Pradeep Ravikumar, Genevera I. Allen, Zhandong Liu
Abstract: Undirected graphical models, such as Gaussian graphical models, Ising, and multinomial/categorical graphical models, are widely used in a variety of applications for modeling distributions over a large number of variables. These standard instances, however, are ill-suited to modeling count data, which are increasingly ubiquitous in big-data settings such as genomic sequencing data, user-ratings data, spatial incidence data, climate studies, and site visits. Existing classes of Poisson graphical models, which arise as the joint distributions that correspond to Poisson distributed node-conditional distributions, have a major drawback: they can only model negative conditional dependencies for reasons of normalizability given its infinite domain. In this paper, our objective is to modify the Poisson graphical model distribution so that it can capture a rich dependence structure between count-valued variables. We begin by discussing two strategies for truncating the Poisson distribution and show that only one of these leads to a valid joint distribution. While this model can accommodate a wider range of conditional dependencies, some limitations still remain. To address this, we investigate two additional novel variants of the Poisson distribution and their corresponding joint graphical model distributions. Our three novel approaches provide classes of Poisson-like graphical models that can capture both positive and negative conditional dependencies between count-valued variables. One can learn the graph structure of our models via penalized neighborhood selection, and we demonstrate the performance of our methods by learning simulated networks as well as a network from microRNA-sequencing data. 1
2 0.16363631 269 nips-2013-Regression-tree Tuning in a Streaming Setting
Author: Samory Kpotufe, Francesco Orabona
Abstract: We consider the problem of maintaining the data-structures of a partition-based regression procedure in a setting where the training data arrives sequentially over time. We prove that it is possible to maintain such a structure in time O (log n) at ˜ any time step n while achieving a nearly-optimal regression rate of O n−2/(2+d) in terms of the unknown metric dimension d. Finally we prove a new regression lower-bound which is independent of a given data size, and hence is more appropriate for the streaming setting. 1
3 0.13265464 67 nips-2013-Conditional Random Fields via Univariate Exponential Families
Author: Eunho Yang, Pradeep Ravikumar, Genevera I. Allen, Zhandong Liu
Abstract: Conditional random fields, which model the distribution of a multivariate response conditioned on a set of covariates using undirected graphs, are widely used in a variety of multivariate prediction applications. Popular instances of this class of models, such as categorical-discrete CRFs, Ising CRFs, and conditional Gaussian based CRFs, are not well suited to the varied types of response variables in many applications, including count-valued responses. We thus introduce a novel subclass of CRFs, derived by imposing node-wise conditional distributions of response variables conditioned on the rest of the responses and the covariates as arising from univariate exponential families. This allows us to derive novel multivariate CRFs given any univariate exponential distribution, including the Poisson, negative binomial, and exponential distributions. Also in particular, it addresses the common CRF problem of specifying “feature” functions determining the interactions between response variables and covariates. We develop a class of tractable penalized M -estimators to learn these CRF distributions from data, as well as a unified sparsistency analysis for this general class of CRFs showing exact structure recovery can be achieved with high probability. 1
4 0.12710811 88 nips-2013-Designed Measurements for Vector Count Data
Author: Liming Wang, David Carlson, Miguel Rodrigues, David Wilcox, Robert Calderbank, Lawrence Carin
Abstract: We consider design of linear projection measurements for a vector Poisson signal model. The projections are performed on the vector Poisson rate, X ∈ Rn , and the + observed data are a vector of counts, Y ∈ Zm . The projection matrix is designed + by maximizing mutual information between Y and X, I(Y ; X). When there is a latent class label C ∈ {1, . . . , L} associated with X, we consider the mutual information with respect to Y and C, I(Y ; C). New analytic expressions for the gradient of I(Y ; X) and I(Y ; C) are presented, with gradient performed with respect to the measurement matrix. Connections are made to the more widely studied Gaussian measurement model. Example results are presented for compressive topic modeling of a document corpora (word counting), and hyperspectral compressive sensing for chemical classification (photon counting). 1
5 0.12328844 18 nips-2013-A simple example of Dirichlet process mixture inconsistency for the number of components
Author: Jeffrey W. Miller, Matthew T. Harrison
Abstract: For data assumed to come from a finite mixture with an unknown number of components, it has become common to use Dirichlet process mixtures (DPMs) not only for density estimation, but also for inferences about the number of components. The typical approach is to use the posterior distribution on the number of clusters — that is, the posterior on the number of components represented in the observed data. However, it turns out that this posterior is not consistent — it does not concentrate at the true number of components. In this note, we give an elementary proof of this inconsistency in what is perhaps the simplest possible setting: a DPM with normal components of unit variance, applied to data from a “mixture” with one standard normal component. Further, we show that this example exhibits severe inconsistency: instead of going to 1, the posterior probability that there is one cluster converges (in probability) to 0. 1
6 0.098360583 4 nips-2013-A Comparative Framework for Preconditioned Lasso Algorithms
7 0.080245428 126 nips-2013-Gaussian Process Conditional Copulas with Applications to Financial Time Series
8 0.080206759 87 nips-2013-Density estimation from unweighted k-nearest neighbor graphs: a roadmap
9 0.075083181 123 nips-2013-Flexible sampling of discrete data correlations without the marginal distributions
10 0.073571347 132 nips-2013-Global MAP-Optimality by Shrinking the Combinatorial Search Area with Convex Relaxation
11 0.07342124 277 nips-2013-Restricting exchangeable nonparametric distributions
12 0.070913285 222 nips-2013-On the Linear Convergence of the Proximal Gradient Method for Trace Norm Regularization
13 0.068512484 262 nips-2013-Real-Time Inference for a Gamma Process Model of Neural Spiking
14 0.065426111 50 nips-2013-Bayesian Mixture Modelling and Inference based Thompson Sampling in Monte-Carlo Tree Search
15 0.064019129 266 nips-2013-Recurrent linear models of simultaneously-recorded neural populations
16 0.06384258 305 nips-2013-Spectral methods for neural characterization using generalized quadratic models
17 0.053716417 286 nips-2013-Robust learning of low-dimensional dynamics from large neural ensembles
18 0.05165397 6 nips-2013-A Determinantal Point Process Latent Variable Model for Inhibition in Neural Spiking Data
19 0.051587608 218 nips-2013-On Sampling from the Gibbs Distribution with Random Maximum A-Posteriori Perturbations
20 0.050974019 285 nips-2013-Robust Transfer Principal Component Analysis with Rank Constraints
topicId topicWeight
[(0, 0.131), (1, 0.037), (2, 0.01), (3, 0.005), (4, -0.04), (5, 0.083), (6, 0.059), (7, 0.001), (8, -0.015), (9, -0.032), (10, -0.02), (11, -0.062), (12, -0.013), (13, -0.032), (14, -0.033), (15, 0.003), (16, 0.051), (17, 0.028), (18, -0.073), (19, 0.097), (20, -0.019), (21, 0.177), (22, 0.019), (23, -0.019), (24, 0.05), (25, -0.026), (26, 0.006), (27, 0.045), (28, -0.009), (29, 0.059), (30, 0.046), (31, -0.196), (32, -0.065), (33, 0.071), (34, -0.024), (35, 0.086), (36, 0.05), (37, -0.051), (38, -0.236), (39, 0.087), (40, -0.021), (41, -0.136), (42, -0.005), (43, -0.088), (44, 0.089), (45, -0.012), (46, 0.147), (47, -0.148), (48, 0.017), (49, -0.132)]
simIndex simValue paperId paperTitle
same-paper 1 0.94170141 217 nips-2013-On Poisson Graphical Models
Author: Eunho Yang, Pradeep Ravikumar, Genevera I. Allen, Zhandong Liu
Abstract: Undirected graphical models, such as Gaussian graphical models, Ising, and multinomial/categorical graphical models, are widely used in a variety of applications for modeling distributions over a large number of variables. These standard instances, however, are ill-suited to modeling count data, which are increasingly ubiquitous in big-data settings such as genomic sequencing data, user-ratings data, spatial incidence data, climate studies, and site visits. Existing classes of Poisson graphical models, which arise as the joint distributions that correspond to Poisson distributed node-conditional distributions, have a major drawback: they can only model negative conditional dependencies for reasons of normalizability given its infinite domain. In this paper, our objective is to modify the Poisson graphical model distribution so that it can capture a rich dependence structure between count-valued variables. We begin by discussing two strategies for truncating the Poisson distribution and show that only one of these leads to a valid joint distribution. While this model can accommodate a wider range of conditional dependencies, some limitations still remain. To address this, we investigate two additional novel variants of the Poisson distribution and their corresponding joint graphical model distributions. Our three novel approaches provide classes of Poisson-like graphical models that can capture both positive and negative conditional dependencies between count-valued variables. One can learn the graph structure of our models via penalized neighborhood selection, and we demonstrate the performance of our methods by learning simulated networks as well as a network from microRNA-sequencing data. 1
2 0.68200183 67 nips-2013-Conditional Random Fields via Univariate Exponential Families
Author: Eunho Yang, Pradeep Ravikumar, Genevera I. Allen, Zhandong Liu
Abstract: Conditional random fields, which model the distribution of a multivariate response conditioned on a set of covariates using undirected graphs, are widely used in a variety of multivariate prediction applications. Popular instances of this class of models, such as categorical-discrete CRFs, Ising CRFs, and conditional Gaussian based CRFs, are not well suited to the varied types of response variables in many applications, including count-valued responses. We thus introduce a novel subclass of CRFs, derived by imposing node-wise conditional distributions of response variables conditioned on the rest of the responses and the covariates as arising from univariate exponential families. This allows us to derive novel multivariate CRFs given any univariate exponential distribution, including the Poisson, negative binomial, and exponential distributions. Also in particular, it addresses the common CRF problem of specifying “feature” functions determining the interactions between response variables and covariates. We develop a class of tractable penalized M -estimators to learn these CRF distributions from data, as well as a unified sparsistency analysis for this general class of CRFs showing exact structure recovery can be achieved with high probability. 1
3 0.5251115 18 nips-2013-A simple example of Dirichlet process mixture inconsistency for the number of components
Author: Jeffrey W. Miller, Matthew T. Harrison
Abstract: For data assumed to come from a finite mixture with an unknown number of components, it has become common to use Dirichlet process mixtures (DPMs) not only for density estimation, but also for inferences about the number of components. The typical approach is to use the posterior distribution on the number of clusters — that is, the posterior on the number of components represented in the observed data. However, it turns out that this posterior is not consistent — it does not concentrate at the true number of components. In this note, we give an elementary proof of this inconsistency in what is perhaps the simplest possible setting: a DPM with normal components of unit variance, applied to data from a “mixture” with one standard normal component. Further, we show that this example exhibits severe inconsistency: instead of going to 1, the posterior probability that there is one cluster converges (in probability) to 0. 1
4 0.46794748 269 nips-2013-Regression-tree Tuning in a Streaming Setting
Author: Samory Kpotufe, Francesco Orabona
Abstract: We consider the problem of maintaining the data-structures of a partition-based regression procedure in a setting where the training data arrives sequentially over time. We prove that it is possible to maintain such a structure in time O (log n) at ˜ any time step n while achieving a nearly-optimal regression rate of O n−2/(2+d) in terms of the unknown metric dimension d. Finally we prove a new regression lower-bound which is independent of a given data size, and hence is more appropriate for the streaming setting. 1
5 0.45659509 87 nips-2013-Density estimation from unweighted k-nearest neighbor graphs: a roadmap
Author: Ulrike Von Luxburg, Morteza Alamgir
Abstract: Consider an unweighted k-nearest neighbor graph on n points that have been sampled i.i.d. from some unknown density p on Rd . We prove how one can estimate the density p just from the unweighted adjacency matrix of the graph, without knowing the points themselves or any distance or similarity scores. The key insights are that local differences in link numbers can be used to estimate a local function of the gradient of p, and that integrating this function along shortest paths leads to an estimate of the underlying density. 1
6 0.45205507 277 nips-2013-Restricting exchangeable nonparametric distributions
7 0.43661422 88 nips-2013-Designed Measurements for Vector Count Data
8 0.43346876 4 nips-2013-A Comparative Framework for Preconditioned Lasso Algorithms
9 0.41908306 132 nips-2013-Global MAP-Optimality by Shrinking the Combinatorial Search Area with Convex Relaxation
10 0.40605646 126 nips-2013-Gaussian Process Conditional Copulas with Applications to Financial Time Series
11 0.37371871 80 nips-2013-Data-driven Distributionally Robust Polynomial Optimization
12 0.35320255 291 nips-2013-Sensor Selection in High-Dimensional Gaussian Trees with Nuisances
13 0.35197937 204 nips-2013-Multiscale Dictionary Learning for Estimating Conditional Distributions
14 0.34035566 184 nips-2013-Marginals-to-Models Reducibility
15 0.33968297 33 nips-2013-An Approximate, Efficient LP Solver for LP Rounding
16 0.33367875 52 nips-2013-Bayesian inference as iterated random functions with applications to sequential inference in graphical models
17 0.33259404 123 nips-2013-Flexible sampling of discrete data correlations without the marginal distributions
18 0.32813847 8 nips-2013-A Graphical Transformation for Belief Propagation: Maximum Weight Matchings and Odd-Sized Cycles
19 0.31979659 305 nips-2013-Spectral methods for neural characterization using generalized quadratic models
20 0.31736004 266 nips-2013-Recurrent linear models of simultaneously-recorded neural populations
topicId topicWeight
[(2, 0.023), (16, 0.019), (33, 0.609), (34, 0.054), (41, 0.015), (49, 0.024), (56, 0.056), (70, 0.017), (85, 0.02), (89, 0.043), (93, 0.02)]
simIndex simValue paperId paperTitle
1 0.99078131 88 nips-2013-Designed Measurements for Vector Count Data
Author: Liming Wang, David Carlson, Miguel Rodrigues, David Wilcox, Robert Calderbank, Lawrence Carin
Abstract: We consider design of linear projection measurements for a vector Poisson signal model. The projections are performed on the vector Poisson rate, X ∈ Rn , and the + observed data are a vector of counts, Y ∈ Zm . The projection matrix is designed + by maximizing mutual information between Y and X, I(Y ; X). When there is a latent class label C ∈ {1, . . . , L} associated with X, we consider the mutual information with respect to Y and C, I(Y ; C). New analytic expressions for the gradient of I(Y ; X) and I(Y ; C) are presented, with gradient performed with respect to the measurement matrix. Connections are made to the more widely studied Gaussian measurement model. Example results are presented for compressive topic modeling of a document corpora (word counting), and hyperspectral compressive sensing for chemical classification (photon counting). 1
same-paper 2 0.98890859 217 nips-2013-On Poisson Graphical Models
Author: Eunho Yang, Pradeep Ravikumar, Genevera I. Allen, Zhandong Liu
Abstract: Undirected graphical models, such as Gaussian graphical models, Ising, and multinomial/categorical graphical models, are widely used in a variety of applications for modeling distributions over a large number of variables. These standard instances, however, are ill-suited to modeling count data, which are increasingly ubiquitous in big-data settings such as genomic sequencing data, user-ratings data, spatial incidence data, climate studies, and site visits. Existing classes of Poisson graphical models, which arise as the joint distributions that correspond to Poisson distributed node-conditional distributions, have a major drawback: they can only model negative conditional dependencies for reasons of normalizability given its infinite domain. In this paper, our objective is to modify the Poisson graphical model distribution so that it can capture a rich dependence structure between count-valued variables. We begin by discussing two strategies for truncating the Poisson distribution and show that only one of these leads to a valid joint distribution. While this model can accommodate a wider range of conditional dependencies, some limitations still remain. To address this, we investigate two additional novel variants of the Poisson distribution and their corresponding joint graphical model distributions. Our three novel approaches provide classes of Poisson-like graphical models that can capture both positive and negative conditional dependencies between count-valued variables. One can learn the graph structure of our models via penalized neighborhood selection, and we demonstrate the performance of our methods by learning simulated networks as well as a network from microRNA-sequencing data. 1
3 0.98666054 81 nips-2013-DeViSE: A Deep Visual-Semantic Embedding Model
Author: Andrea Frome, Greg S. Corrado, Jon Shlens, Samy Bengio, Jeff Dean, Marc'Aurelio Ranzato, Tomas Mikolov
Abstract: Modern visual recognition systems are often limited in their ability to scale to large numbers of object categories. This limitation is in part due to the increasing difficulty of acquiring sufficient training data in the form of labeled images as the number of object categories grows. One remedy is to leverage data from other sources – such as text data – both to train visual models and to constrain their predictions. In this paper we present a new deep visual-semantic embedding model trained to identify visual objects using both labeled image data as well as semantic information gleaned from unannotated text. We demonstrate that this model matches state-of-the-art performance on the 1000-class ImageNet object recognition challenge while making more semantically reasonable errors, and also show that the semantic information can be exploited to make predictions about tens of thousands of image labels not observed during training. Semantic knowledge improves such zero-shot predictions achieving hit rates of up to 18% across thousands of novel labels never seen by the visual model. 1
4 0.98406595 306 nips-2013-Speeding up Permutation Testing in Neuroimaging
Author: Chris Hinrichs, Vamsi Ithapu, Qinyuan Sun, Sterling C. Johnson, Vikas Singh
Abstract: Multiple hypothesis testing is a significant problem in nearly all neuroimaging studies. In order to correct for this phenomena, we require a reliable estimate of the Family-Wise Error Rate (FWER). The well known Bonferroni correction method, while simple to implement, is quite conservative, and can substantially under-power a study because it ignores dependencies between test statistics. Permutation testing, on the other hand, is an exact, non-parametric method of estimating the FWER for a given α-threshold, but for acceptably low thresholds the computational burden can be prohibitive. In this paper, we show that permutation testing in fact amounts to populating the columns of a very large matrix P. By analyzing the spectrum of this matrix, under certain conditions, we see that P has a low-rank plus a low-variance residual decomposition which makes it suitable for highly sub–sampled — on the order of 0.5% — matrix completion methods. Based on this observation, we propose a novel permutation testing methodology which offers a large speedup, without sacrificing the fidelity of the estimated FWER. Our evaluations on four different neuroimaging datasets show that a computational speedup factor of roughly 50× can be achieved while recovering the FWER distribution up to very high accuracy. Further, we show that the estimated α-threshold is also recovered faithfully, and is stable. 1
5 0.97473335 160 nips-2013-Learning Stochastic Feedforward Neural Networks
Author: Yichuan Tang, Ruslan Salakhutdinov
Abstract: Multilayer perceptrons (MLPs) or neural networks are popular models used for nonlinear regression and classification tasks. As regressors, MLPs model the conditional distribution of the predictor variables Y given the input variables X. However, this predictive distribution is assumed to be unimodal (e.g. Gaussian). For tasks involving structured prediction, the conditional distribution should be multi-modal, resulting in one-to-many mappings. By using stochastic hidden variables rather than deterministic ones, Sigmoid Belief Nets (SBNs) can induce a rich multimodal distribution in the output space. However, previously proposed learning algorithms for SBNs are not efficient and unsuitable for modeling real-valued data. In this paper, we propose a stochastic feedforward network with hidden layers composed of both deterministic and stochastic variables. A new Generalized EM training procedure using importance sampling allows us to efficiently learn complicated conditional distributions. Our model achieves superior performance on synthetic and facial expressions datasets compared to conditional Restricted Boltzmann Machines and Mixture Density Networks. In addition, the latent features of our model improves classification and can learn to generate colorful textures of objects. 1
6 0.96940243 46 nips-2013-Bayesian Estimation of Latently-grouped Parameters in Undirected Graphical Models
7 0.96272135 212 nips-2013-Non-Uniform Camera Shake Removal Using a Spatially-Adaptive Sparse Penalty
8 0.95729786 253 nips-2013-Prior-free and prior-dependent regret bounds for Thompson Sampling
9 0.95171881 222 nips-2013-On the Linear Convergence of the Proximal Gradient Method for Trace Norm Regularization
10 0.92317063 356 nips-2013-Zero-Shot Learning Through Cross-Modal Transfer
11 0.90400964 93 nips-2013-Discriminative Transfer Learning with Tree-based Priors
12 0.88483191 67 nips-2013-Conditional Random Fields via Univariate Exponential Families
13 0.88197523 200 nips-2013-Multi-Prediction Deep Boltzmann Machines
14 0.88185668 254 nips-2013-Probabilistic Low-Rank Matrix Completion with Adaptive Spectral Regularization Algorithms
15 0.87970096 334 nips-2013-Training and Analysing Deep Recurrent Neural Networks
16 0.87788051 260 nips-2013-RNADE: The real-valued neural autoregressive density-estimator
17 0.87726659 349 nips-2013-Visual Concept Learning: Combining Machine Vision and Bayesian Generalization on Concept Hierarchies
18 0.87045038 315 nips-2013-Stochastic Ratio Matching of RBMs for Sparse High-Dimensional Inputs
19 0.86821318 335 nips-2013-Transfer Learning in a Transductive Setting
20 0.86818016 331 nips-2013-Top-Down Regularization of Deep Belief Networks