nips nips2013 nips2013-67 knowledge-graph by maker-knowledge-mining

67 nips-2013-Conditional Random Fields via Univariate Exponential Families

Source: pdf

Author: Eunho Yang, Pradeep Ravikumar, Genevera I. Allen, Zhandong Liu

Abstract: Conditional random ﬁelds, which model the distribution of a multivariate response conditioned on a set of covariates using undirected graphs, are widely used in a variety of multivariate prediction applications. Popular instances of this class of models, such as categorical-discrete CRFs, Ising CRFs, and conditional Gaussian based CRFs, are not well suited to the varied types of response variables in many applications, including count-valued responses. We thus introduce a novel subclass of CRFs, derived by imposing node-wise conditional distributions of response variables conditioned on the rest of the responses and the covariates as arising from univariate exponential families. This allows us to derive novel multivariate CRFs given any univariate exponential distribution, including the Poisson, negative binomial, and exponential distributions. Also in particular, it addresses the common CRF problem of specifying “feature” functions determining the interactions between response variables and covariates. We develop a class of tractable penalized M -estimators to learn these CRF distributions from data, as well as a uniﬁed sparsistency analysis for this general class of CRFs showing exact structure recovery can be achieved with high probability. 1

Reference: text

Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 edu Abstract Conditional random ﬁelds, which model the distribution of a multivariate response conditioned on a set of covariates using undirected graphs, are widely used in a variety of multivariate prediction applications. [sent-8, score-0.563]

2 Popular instances of this class of models, such as categorical-discrete CRFs, Ising CRFs, and conditional Gaussian based CRFs, are not well suited to the varied types of response variables in many applications, including count-valued responses. [sent-9, score-0.27]

3 We thus introduce a novel subclass of CRFs, derived by imposing node-wise conditional distributions of response variables conditioned on the rest of the responses and the covariates as arising from univariate exponential families. [sent-10, score-0.977]

4 This allows us to derive novel multivariate CRFs given any univariate exponential distribution, including the Poisson, negative binomial, and exponential distributions. [sent-11, score-0.275]

5 Also in particular, it addresses the common CRF problem of specifying “feature” functions determining the interactions between response variables and covariates. [sent-12, score-0.261]

6 We develop a class of tractable penalized M -estimators to learn these CRF distributions from data, as well as a uniﬁed sparsistency analysis for this general class of CRFs showing exact structure recovery can be achieved with high probability. [sent-13, score-0.085]

7 1 Introduction Conditional random ﬁelds (CRFs) are a popular class of models that combine the advantages of discriminative modeling and undirected graphical models. [sent-14, score-0.1]

8 The key idea in this class of models is to represent the joint distribution of a set of response variables conditioned on a set of covariates using a product of clique-wise compatibility functions. [sent-16, score-0.571]

9 Given an underlying graph over the response variables, each of these compatibility functions depends on all the covariates, but only on a subset of response variables within any clique of the underlying graph. [sent-17, score-0.409]

10 They are thus a discriminative counterpart of undirected graphical models, where we have covariates that provide information about the multivariate response, and the underlying graph structure encodes conditional independence assumptions among the responses conditioned on the covariates. [sent-18, score-0.776]

11 There is a key model speciﬁcation question that arises, however, in any application of CRFs: how do we specify the clique-wise sufﬁcient statistics, or compatibility functions (sometimes also called feature functions), that characterize the conditional graphical model between responses? [sent-19, score-0.271]

12 In par1 ticular, how do we tune these to the particular types of variables being modeled? [sent-20, score-0.051]

13 These sub-classes of CRFs, however, are speciﬁc to Gaussian and binary variable types, and may not be appropriate for multivariate count data or skewed continuous data, for example, which are increasingly seen in big-data settings such as high-throughput genomic sequencing. [sent-26, score-0.064]

14 This subclass of “exponential family” CRFs can be viewed as a conditional extension of the MRF framework of [9, 10]. [sent-29, score-0.126]

15 As such, this broadens the class of off-the-shelf CRF models to encompass data that follows distributions other than the standard discrete, binary, or Gaussian instances. [sent-30, score-0.092]

16 Given this new family of CRFs, we additionally show that if covariates also follow node-conditional univariate exponential family distributions, then the functions over features in turn are precisely speciﬁed by the exponential family sufﬁcient statistics. [sent-31, score-0.771]

17 Thus, our twin results deﬁnitively answer for the ﬁrst time the key model speciﬁcation question of specifying compatibility or feature functions for a broad family of CRF distributions. [sent-32, score-0.204]

18 We then provide a uniﬁed M -estimation procedure, via penalized neighborhood estimation, to learn our family of CRFs from i. [sent-33, score-0.107]

19 Our result can be viewed as an extension of neighborhood selection results for MRFs [11, 12, 13]. [sent-38, score-0.065]

20 Overall, this paper provides a family of CRFs that generalizes many of the sub-classes in the existing literature and broadens the utility and applicability of CRFs to model many other types of multivariate responses. [sent-39, score-0.154]

21 2 Conditional Graphical Models via Exponential Families Suppose we have a p-variate random response vector Y = (Y1 , . [sent-40, score-0.121]

22 , Yp ), with each response variable Ys taking values in a set Ys . [sent-43, score-0.121]

23 Suppose we also have a set of covariates X = (X1 , . [sent-44, score-0.299]

24 Suppose G = (V, E) is an undirected graph over p nodes corresponding to the p response variables. [sent-48, score-0.193]

25 Given the underlying graph G, and the set of cliques (fully-connected sub-graphs) C of the graph G, the corresponding conditional random ﬁeld (CRF) is a set of distributions over the response conditioned on the covariates that satisfy Markov independence assumptions with respect to the graph G. [sent-49, score-0.743]

26 Speciﬁcally, letting { c (Yc , X)}c2C denote a set of clique-wise sufﬁcient statistics, any strictly positive distribution of YP conditioned on X within the conditional random ﬁeld family takes the form: P (Y |X) / exp{ c2C c (Yc , X)}. [sent-50, score-0.226]

27 With a pair-wise conditional random ﬁeld distribution, the set of cliques consists of the set of nodes V and the set of edges E, so that ⇢X X P (Y |X) / exp s (Ys , X) + st (Ys , Yt , X) . [sent-51, score-0.287]

28 We have a considerable understanding of how to specify univariate distributions over various types of variables as well as on how to model their conditional response through regression. [sent-53, score-0.435]

29 Consider the univariate exponential family class of distributions: P (Z) = exp(✓ B(Z) + C(Z) D(✓)), with sufﬁcient 2 statistics B(Z), base measure C(Z), and log-normalization constant D(✓). [sent-54, score-0.231]

30 Such exponential family distributions include a wide variety of commonly used distributions such as Gaussian, Bernoulli, multinomial, Poisson, exponential, gamma, chi-squared, beta, any of which can be instantiated with particular choices of the functions B(·), and C(·). [sent-55, score-0.266]

31 Such univariate exponential family distributions are thus used to model a wide variety of data types including skewed continuous data and count data. [sent-56, score-0.33]

32 Additionally, through generalized linear models, they are used to model the response of various data types conditional on a set of covariates. [sent-57, score-0.247]

33 Here, we seek to use our understanding of univariate exponential families and generalized linear models to specify a conditional graphical model distribution. [sent-58, score-0.379]

34 Consider the conditional extension of the construction in [14, 9, 10]. [sent-59, score-0.098]

35 Suppose that the nodeconditional distributions of response variables, Ys , conditioned on the rest of the response variables, YV \s , and the covariates, X, is given by an univariate exponential family: P (Ys |YV \s , X) = exp{Es (YV \s , X) Bs (Ys ) + Cs (Ys ) ¯ Ds (YV \s , X)}. [sent-60, score-0.497]

36 (1) Here, the functions Bs (·), Cs (·) are speciﬁed by the choice of the exponential family, and the parameter Es (YV \s , X) is an arbitrary function of the variables Yt in N (s) and the covariates X; N (s) is the set of neighbors of node s according to an undirected graph G = (V, E). [sent-61, score-0.532]

37 Would these node-conditional distributions be consistent with a joint distribution? [sent-62, score-0.076]

38 Would this joint distribution factor according a conditional random ﬁeld given by graph G? [sent-63, score-0.176]

39 And would there be restrictions on the form of the functions Es (YV \s , X)? [sent-64, score-0.074]

40 We note that it generalizes the MRF framework of [9, 10] in two ways: it allows for the presence of conditional covariates, and moreover allows for heterogeneous types and domains of distributions with the different choices of Bs (·) and Cs (·) at each individual node. [sent-66, score-0.17]

41 These assertions on the conditional and joint distributions respectively are consistent if and only if the conditional distribution in (1) has the tensor-factorized form: ⇢ ⇣ X P (Ys |YV \s , X; ✓) = exp Bs (Ys ) ✓s (X) + ✓st (X) Bt (Yt ) + . [sent-76, score-0.358]

42 tk (X)} is a set of functions that depend only on the covariates X. [sent-91, score-0.344]

43 Moreover, the corresponding joint conditional random ﬁeld distribution has the form: P (Y |X; ✓) = exp ⇢X ✓s (X)Bs (Ys ) + s + . [sent-92, score-0.163]

44 Theorem 1 speciﬁes the form of the function Es (YV \s , X) deﬁning the canonical parameter in the univariate exponential family distribution (1). [sent-102, score-0.231]

45 This function is a tensor factorization of products of sufﬁcient statistics of YV \s , and “observation functions”, ✓(X), of the covariates X alone. [sent-103, score-0.299]

46 We also note that we can allow different exponential families for each of the node-conditional distributions of the response variables, meaning that the domains, Ys , or the sufﬁcient statistics functions, Bs (·), can vary across the response variables Ys . [sent-105, score-0.417]

47 A common setting of these sufﬁcient statistics functions however, for many popular distributions (Gaussian, Bernoulli, etc. [sent-106, score-0.089]

48 (5) s2V (s,t)2E Theorem 1 then addresses the model speciﬁcation question of how to select the compatibility functions in CRFs for varied types of responses. [sent-110, score-0.126]

49 (This only provides a restriction when the domain of the response variables is not ﬁnite). [sent-112, score-0.144]

50 In the next section, we address the second model speciﬁcation question of how to set the covariate functions. [sent-113, score-0.109]

51 1 Setting Covariate Functions A candidate approach to specifying the observation functions, ✓(X), in the CRF distribution above would be to make distributional assumptions on X. [sent-115, score-0.095]

52 Since Theorem 1 speciﬁes the conditional distribution P (Y |X), specifying the marginal distribution P (X) would allow us to specify the joint distribution P (Y, X) without further restrictions on P (Y |X) using the simple product rule: P (X, Y ) = P (Y |X) P (X). [sent-116, score-0.225]

53 As an example, suppose that the covariates X follow an MRF distribution with graph G0 = (V 0 , E 0 ), and parameters #: n X o X P (X) = exp #u u (Xu ) + #uv uv (Xu , Xv ) A(#) . [sent-117, score-0.417]

54 u2V 0 (u,v)2V 0 ⇥V 0 Then, for any CRF distribution P (Y |X) in (4), we have nX X X X P (X, Y ) = exp #u u (Xu ) + #uv uv (Xu , Xv ) + ✓s (X)Ys + ✓st (X)Ys Yt u + X (u,v) Cs (Ys ) A(#) A ✓(X) s o s (s,t) . [sent-118, score-0.072]

55 Thus, a distributional assumption on P (X) does not restrict the set of covariate functions in any way. [sent-120, score-0.206]

56 On the other hand, specifying the conditional distribution, P (X|Y ), naturally entails restrictions on the form of P (Y |X). [sent-121, score-0.201]

57 Under these additional distributional assumptions in (6), what form would the CRF distribution in Theorem 1 take? [sent-123, score-0.052]

58 Consider the following assertions: (a) the conditional CRF distribution of the responses Y = (Y1 , . [sent-131, score-0.251]

59 These assertions are consistent if and only if the CRF distribution takes the form: 4 P (Y |X) = exp k nX X ↵tr ,sl l=1 tr 2V,sl r 2V 0 1 1 (tr ,sl r )2C 1 1 1 1 r l r Y Bsj (Xsj ) j=1 r Y Btj (Ytj ) + j=1 X t2V Ct (Yt ) o A(↵, X) , (7) so that the observation functions ✓t1 ,. [sent-138, score-0.154]

60 ,tr (X) in the CRF distribution (4) are tensor products of k l r X X Y univariate functions: ✓t1 ,. [sent-141, score-0.098]

61 1 l=1 sl r 2V 0 1 (tr ,sl r )2C 1 1 1 j=1 Let us examine the consequences of this theorem for the pair-wise CRF distributions (4). [sent-145, score-0.065]

62 (9) u2V 0 (Remark 1) While we have derived the covariate functions in Theorem 2 by assuming a distributional form on X, using the resulting covariate functions do not necessarily impose distributional assumptions on X. [sent-148, score-0.412]

63 We can thus leverage the parametric CRF distributional form in Theorem 2 without necessarily imposing stringent distributional assumptions on X. [sent-150, score-0.104]

64 (Remark 2) Consider the form of the covariate functions given by (8) compared to (9). [sent-151, score-0.154]

65 What does sparsity in the parameters entail in terms of conditional independence assumptions? [sent-152, score-0.098]

66 ✓st = 0 in (8) entails that Ys is conditionally independent of Yt given the other responses and all the covariates. [sent-153, score-0.184]

67 Thus, the parametrization in (8) corresponds to pair-wise conditional independence assumptions between the responses (structure learning) and between the responses and covariates (feature selection). [sent-154, score-0.703]

68 Letting ✓stu = 0 entails the lack of a third-order interaction between the pair of responses Ys and Yt and the covariate Xu , conditioned on all other responses and covariates. [sent-156, score-0.489]

69 (Remark 3) Our general subclasses of CRFs speciﬁed by Theorems 1 and 2 encompass many existing CRF families as special cases, in addition to providing many novel forms of CRFs. [sent-157, score-0.06]

70 • Several novel forms of CRFs can be derived by specifying node-conditional distributions as Poisson or exponential, for example. [sent-161, score-0.087]

71 Such restrictions are typically needed for cases where the variables have inﬁnite domains. [sent-164, score-0.052]

72 5 3 Graphical Model Structure Learning We now address the task of learning a CRF distribution from our general family given i. [sent-165, score-0.063]

73 [19] provide guarantees on structure recovery for low tree-width discrete CRFs using graph cuts, and a maximum weight spanning tree based method respectively. [sent-174, score-0.087]

74 [6] provide structure recovery guarantees for their two-stage procedure for recovering (a reparameterization of) a conditional Gaussian based CRF; and the semi-parameteric partition based Gaussian CRF respectively. [sent-177, score-0.161]

75 Here, we provide a single theorem that provides structure recovery guarantees for any CRF from our class of exponential family CRFs, which encompasses not only Ising, and Gaussian based CRFs, but all other instances within our class, such as Poisson CRFs, exponential CRFs, and so on. [sent-178, score-0.265]

76 The task of CRF parameter learning corresponds to estimating the parameters ✓⇤ , structure learning corresponds to recovering the edge-set E, and feature selection corresponds to recovering the neighborhoods N 0 (s) in (10). [sent-183, score-0.069]

77 Note that x,n and y,n do not need to be the same as y,n determines the degree of sparsity between Ys and YV \s , and similarly x,n does the degree of sparsity between Ys and covariates X. [sent-190, score-0.299]

78 Given this M -estimator, we can recover the y response-variable-neighborhood of response Ys as N (s) = {t 2 V \s | ✓st 6= 0}, and the feature0 0 x neighborhood of the response Ys as N (s) = {t 2 V | ✓su 6= 0}. [sent-191, score-0.286]

79 Suppose that the regularization parameters in (11) are chosen such that x,n M1 r log q , n y,n M1 r log p n and max{ x,n , y,n }  M2 , where M1 and M2 are some constants depending on the node conditional distribution in the form of p p ⇤ exponential family. [sent-194, score-0.191]

80 Further suppose that mint2N (s) |✓st | ⇢10 max dx x,n , dy y,n where min ⇢min is the minimum eigenvalue of the Hessian of the loss function at ✓ x⇤ , ✓ y⇤ , and dx , dy are the number of nonzero elements in ✓ x⇤ and ✓ y⇤ , respectively. [sent-195, score-0.118]

81 b (a) (Parameter Error) For each node s 2 V , the solution ✓ of the M-estimation problem in (11) is unique with parameter error bound p p 5 c c k✓ x ✓ x⇤ k2 + k✓ y ✓ y⇤ k2  max dx x,n , dy y,n ⇢min 6 0. [sent-197, score-0.082]

82 4 False Positive Rate (b) Ising models 1 True Positive Rate True Positive Rate (a) Gaussian graphical models True Positive Rate G−CRF cGGM pGGM 0 0 0. [sent-206, score-0.052]

83 4 False Positive Rate (c) Poisson graphical models Figure 1: (a) ROC curves averaged over 50 simulations from a Gaussian CRF with p = 50 responses, q = 50 covariates, and (left) n = 100 and (right) n = 250 samples. [sent-215, score-0.075]

84 (c) (Feature Selection) The M-estimate recovers the true response neighborhoods exactly, so that b N (s) = N (s), for all s 2 V . [sent-222, score-0.169]

85 Extending our statistical analysis in Theorem 3 for pair-wise CRFs to general CRF distributions (3) as well as general covariate functions, such as in (9), are omitted for space reasons and left for future work. [sent-225, score-0.153]

86 We assume the true conditional distribution, P (Y |X), follows (7) P P with the parameters: ✓s (X) = ✓s + u2V 0 ✓su Xu , ✓st (X) = ✓st + u2V 0 ✓stu Xu for some constant parameters ✓s , ✓su , ✓st and ✓stu . [sent-228, score-0.098]

87 Data of size p = 50 responses and q = 50 covariates was generated for n = 100 and n = 250 samples. [sent-240, score-0.452]

88 The latter shows the third-order interactions between gene-pairs and each of the ﬁve common aberration covariates (EGFR, PTEN, CDKN2A, PDGFRA, and CDK4). [sent-244, score-0.328]

89 The models were learned from gene expression array data of Glioblastoma samples, and the plots display the response neighborhoods of gene TWIST1. [sent-245, score-0.251]

90 We demonstrate the performance of our CRF models by learning genetic networks of Glioblastoma conditioned on common copy number aberrations. [sent-250, score-0.073]

91 Level III gene expression data measured by Aglient arrays for n = 465 Glioblastoma tumor samples as well as copy number variation measured by CGH-arrays were downloaded from the Cancer Genome Atlas data portal [20]. [sent-251, score-0.119]

92 The ﬁve most common copy number aberrations across all subjects were taken as covariates. [sent-253, score-0.096]

93 The mean-speciﬁed CRF is more sparse as conditioning on copy number aberrations may explain many of the conditional dependencies with TWIST1 that are captured by GGMs, demonstrating the utility of conditional modeling via CRFs. [sent-259, score-0.292]

94 Thus, our covariance-speciﬁed CRF network may indicate that these two aberrations are the most salient in interacting with pairs of genes that include the gene TWIST1. [sent-263, score-0.131]

95 Overall, our analysis has demonstrated the applied advantages of our CRF models; namely, one can study the network structure between responses conditional on covariates and/or between pairs of responses that interact with particular covariates. [sent-264, score-0.703]

96 A sparse conditional gaussian graphical model for analysis of genetical genomics data. [sent-321, score-0.222]

97 High-dimensional ising model selection using `1 regularized logistic regression. [sent-369, score-0.125]

98 Comprehensive genomic characterization deﬁnes human glioblastoma genes and core pathways. [sent-416, score-0.113]

99 Stability approach to regularization selection (stars) for high dimensional graphical models. [sent-422, score-0.073]

100 Twist1 promotes invasion through mesenchymal a change in human glioblastoma. [sent-458, score-0.06]

similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('crf', 0.488), ('crfs', 0.4), ('ys', 0.387), ('covariates', 0.299), ('yv', 0.171), ('st', 0.156), ('responses', 0.153), ('bs', 0.144), ('response', 0.121), ('covariate', 0.109), ('ising', 0.104), ('conditional', 0.098), ('univariate', 0.098), ('mrf', 0.096), ('glioblastoma', 0.089), ('xu', 0.087), ('xv', 0.079), ('poisson', 0.075), ('su', 0.072), ('cs', 0.072), ('yt', 0.071), ('bu', 0.071), ('exponential', 0.07), ('stu', 0.066), ('aberrations', 0.066), ('family', 0.063), ('cggm', 0.06), ('pggm', 0.06), ('yp', 0.057), ('compatibility', 0.053), ('assertions', 0.053), ('graphical', 0.052), ('distributional', 0.052), ('tumor', 0.048), ('neighborhoods', 0.048), ('graph', 0.046), ('functions', 0.045), ('btj', 0.045), ('pdgfra', 0.045), ('pten', 0.045), ('neighborhood', 0.044), ('distributions', 0.044), ('specifying', 0.043), ('conditioned', 0.043), ('gaussian', 0.042), ('bt', 0.041), ('gene', 0.041), ('ravikumar', 0.041), ('recovery', 0.041), ('roc', 0.04), ('ytj', 0.039), ('uv', 0.039), ('families', 0.038), ('multivariate', 0.037), ('false', 0.035), ('dy', 0.034), ('exp', 0.033), ('allen', 0.032), ('joint', 0.032), ('entails', 0.031), ('xq', 0.031), ('copy', 0.03), ('bsj', 0.03), ('egfr', 0.03), ('genetical', 0.03), ('gitelman', 0.03), ('invasion', 0.03), ('mesenchymal', 0.03), ('shahaf', 0.03), ('xsj', 0.03), ('restrictions', 0.029), ('genome', 0.029), ('interactions', 0.029), ('subclass', 0.028), ('types', 0.028), ('lattice', 0.027), ('skewed', 0.027), ('broadens', 0.026), ('undirected', 0.026), ('dx', 0.025), ('ggms', 0.024), ('genes', 0.024), ('cancer', 0.024), ('arxiv', 0.024), ('curves', 0.023), ('node', 0.023), ('eunho', 0.023), ('variables', 0.023), ('specify', 0.023), ('tr', 0.023), ('discriminative', 0.022), ('letting', 0.022), ('encompass', 0.022), ('murphy', 0.022), ('reparameterization', 0.022), ('selection', 0.021), ('bernoulli', 0.021), ('yc', 0.021), ('theorem', 0.021), ('liu', 0.021)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.99999958 67 nips-2013-Conditional Random Fields via Univariate Exponential Families

Author: Eunho Yang, Pradeep Ravikumar, Genevera I. Allen, Zhandong Liu

2 0.17418441 330 nips-2013-Thompson Sampling for 1-Dimensional Exponential Family Bandits

Author: Nathaniel Korda, Emilie Kaufmann, Remi Munos

Abstract: Thompson Sampling has been demonstrated in many complex bandit models, however the theoretical guarantees available for the parametric multi-armed bandit are still limited to the Bernoulli case. Here we extend them by proving asymptotic optimality of the algorithm using the Jeffreys prior for 1-dimensional exponential family bandits. Our proof builds on previous work, but also makes extensive use of closed forms for Kullback-Leibler divergence and Fisher information (through the Jeffreys prior) available in an exponential family. This allow us to give a ﬁnite time exponential concentration inequality for posterior distributions on exponential families that may be of interest in its own right. Moreover our analysis covers some distributions for which no optimistic algorithm has yet been proposed, including heavy-tailed exponential families. 1

3 0.13265464 217 nips-2013-On Poisson Graphical Models

Author: Eunho Yang, Pradeep Ravikumar, Genevera I. Allen, Zhandong Liu

Abstract: Undirected graphical models, such as Gaussian graphical models, Ising, and multinomial/categorical graphical models, are widely used in a variety of applications for modeling distributions over a large number of variables. These standard instances, however, are ill-suited to modeling count data, which are increasingly ubiquitous in big-data settings such as genomic sequencing data, user-ratings data, spatial incidence data, climate studies, and site visits. Existing classes of Poisson graphical models, which arise as the joint distributions that correspond to Poisson distributed node-conditional distributions, have a major drawback: they can only model negative conditional dependencies for reasons of normalizability given its inﬁnite domain. In this paper, our objective is to modify the Poisson graphical model distribution so that it can capture a rich dependence structure between count-valued variables. We begin by discussing two strategies for truncating the Poisson distribution and show that only one of these leads to a valid joint distribution. While this model can accommodate a wider range of conditional dependencies, some limitations still remain. To address this, we investigate two additional novel variants of the Poisson distribution and their corresponding joint graphical model distributions. Our three novel approaches provide classes of Poisson-like graphical models that can capture both positive and negative conditional dependencies between count-valued variables. One can learn the graph structure of our models via penalized neighborhood selection, and we demonstrate the performance of our methods by learning simulated networks as well as a network from microRNA-sequencing data. 1

4 0.12640193 23 nips-2013-Active Learning for Probabilistic Hypotheses Using the Maximum Gibbs Error Criterion

Author: Nguyen Viet Cuong, Wee Sun Lee, Nan Ye, Kian Ming A. Chai, Hai Leong Chieu

Abstract: We introduce a new objective function for pool-based Bayesian active learning with probabilistic hypotheses. This objective function, called the policy Gibbs error, is the expected error rate of a random classiﬁer drawn from the prior distribution on the examples adaptively selected by the active learning policy. Exact maximization of the policy Gibbs error is hard, so we propose a greedy strategy that maximizes the Gibbs error at each iteration, where the Gibbs error on an instance is the expected error of a random classiﬁer selected from the posterior label distribution on that instance. We apply this maximum Gibbs error criterion to three active learning scenarios: non-adaptive, adaptive, and batch active learning. In each scenario, we prove that the criterion achieves near-maximal policy Gibbs error when constrained to a ﬁxed budget. For practical implementations, we provide approximations to the maximum Gibbs error criterion for Bayesian conditional random ﬁelds and transductive Naive Bayes. Our experimental results on a named entity recognition task and a text classiﬁcation task show that the maximum Gibbs error criterion is an effective active learning criterion for noisy models. 1

5 0.078585252 132 nips-2013-Global MAP-Optimality by Shrinking the Combinatorial Search Area with Convex Relaxation

Author: Bogdan Savchynskyy, Jörg Hendrik Kappes, Paul Swoboda, Christoph Schnörr

Abstract: We consider energy minimization for undirected graphical models, also known as the MAP-inference problem for Markov random ﬁelds. Although combinatorial methods, which return a provably optimal integral solution of the problem, made a signiﬁcant progress in the past decade, they are still typically unable to cope with large-scale datasets. On the other hand, large scale datasets are often deﬁned on sparse graphs and convex relaxation methods, such as linear programming relaxations then provide good approximations to integral solutions. We propose a novel method of combining combinatorial and convex programming techniques to obtain a global solution of the initial combinatorial problem. Based on the information obtained from the solution of the convex relaxation, our method conﬁnes application of the combinatorial solver to a small fraction of the initial graphical model, which allows to optimally solve much larger problems. We demonstrate the efﬁcacy of our approach on a computer vision energy minimization benchmark. 1

6 0.07643196 280 nips-2013-Robust Data-Driven Dynamic Programming

7 0.070300385 318 nips-2013-Structured Learning via Logistic Regression

8 0.067570962 286 nips-2013-Robust learning of low-dimensional dynamics from large neural ensembles

9 0.061742958 266 nips-2013-Recurrent linear models of simultaneously-recorded neural populations

10 0.059919927 68 nips-2013-Confidence Intervals and Hypothesis Testing for High-Dimensional Statistical Models

11 0.056915373 207 nips-2013-Near-optimal Anomaly Detection in Graphs using Lovasz Extended Scan Statistic

12 0.054252047 258 nips-2013-Projecting Ising Model Parameters for Fast Mixing

13 0.053856861 150 nips-2013-Learning Adaptive Value of Information for Structured Prediction

14 0.05247429 194 nips-2013-Model Selection for High-Dimensional Regression under the Generalized Irrepresentability Condition

15 0.048576284 91 nips-2013-Dirty Statistical Models

16 0.04830331 136 nips-2013-Hierarchical Modular Optimization of Convolutional Networks Achieves Representations Similar to Macaque IT and Human Ventral Stream

17 0.048261266 288 nips-2013-Scalable Influence Estimation in Continuous-Time Diffusion Networks

18 0.046515822 149 nips-2013-Latent Structured Active Learning

19 0.046457361 237 nips-2013-Optimal integration of visual speed across different spatiotemporal frequency channels

20 0.045308381 291 nips-2013-Sensor Selection in High-Dimensional Gaussian Trees with Nuisances

similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.131), (1, 0.023), (2, -0.003), (3, 0.009), (4, -0.015), (5, 0.046), (6, 0.043), (7, -0.046), (8, -0.024), (9, -0.007), (10, 0.003), (11, -0.016), (12, -0.034), (13, -0.037), (14, -0.043), (15, -0.053), (16, -0.041), (17, -0.012), (18, -0.041), (19, 0.049), (20, -0.023), (21, 0.102), (22, 0.07), (23, -0.118), (24, 0.073), (25, -0.004), (26, 0.011), (27, 0.062), (28, -0.003), (29, 0.075), (30, 0.087), (31, -0.086), (32, -0.046), (33, 0.031), (34, -0.027), (35, 0.078), (36, -0.054), (37, -0.174), (38, -0.069), (39, -0.001), (40, 0.053), (41, -0.077), (42, -0.019), (43, -0.052), (44, 0.039), (45, 0.006), (46, 0.17), (47, -0.044), (48, 0.093), (49, 0.007)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.93784893 67 nips-2013-Conditional Random Fields via Univariate Exponential Families

Author: Eunho Yang, Pradeep Ravikumar, Genevera I. Allen, Zhandong Liu

2 0.72943199 217 nips-2013-On Poisson Graphical Models

Author: Eunho Yang, Pradeep Ravikumar, Genevera I. Allen, Zhandong Liu

3 0.54269081 132 nips-2013-Global MAP-Optimality by Shrinking the Combinatorial Search Area with Convex Relaxation

Author: Bogdan Savchynskyy, Jörg Hendrik Kappes, Paul Swoboda, Christoph Schnörr

4 0.45814639 291 nips-2013-Sensor Selection in High-Dimensional Gaussian Trees with Nuisances

Author: Daniel S. Levine, Jonathan P. How

Abstract: We consider the sensor selection problem on multivariate Gaussian distributions where only a subset of latent variables is of inferential interest. For pairs of vertices connected by a unique path in the graph, we show that there exist decompositions of nonlocal mutual information into local information measures that can be computed efﬁciently from the output of message passing algorithms. We integrate these decompositions into a computationally efﬁcient greedy selector where the computational expense of quantiﬁcation can be distributed across nodes in the network. Experimental results demonstrate the comparative efﬁciency of our algorithms for sensor selection in high-dimensional distributions. We additionally derive an online-computable performance bound based on augmentations of the relevant latent variable set that, when such a valid augmentation exists, is applicable for any distribution with nuisances. 1

5 0.45017776 330 nips-2013-Thompson Sampling for 1-Dimensional Exponential Family Bandits

Author: Nathaniel Korda, Emilie Kaufmann, Remi Munos

6 0.4223395 33 nips-2013-An Approximate, Efficient LP Solver for LP Rounding

7 0.41889986 280 nips-2013-Robust Data-Driven Dynamic Programming

8 0.41784146 152 nips-2013-Learning Efficient Random Maximum A-Posteriori Predictors with Non-Decomposable Loss Functions

9 0.40797222 305 nips-2013-Spectral methods for neural characterization using generalized quadratic models

10 0.38793418 184 nips-2013-Marginals-to-Models Reducibility

11 0.38355649 207 nips-2013-Near-optimal Anomaly Detection in Graphs using Lovasz Extended Scan Statistic

12 0.37888035 8 nips-2013-A Graphical Transformation for Belief Propagation: Maximum Weight Matchings and Odd-Sized Cycles

13 0.37860662 204 nips-2013-Multiscale Dictionary Learning for Estimating Conditional Distributions

14 0.37785754 277 nips-2013-Restricting exchangeable nonparametric distributions

15 0.36991891 53 nips-2013-Bayesian inference for low rank spatiotemporal neural receptive fields

16 0.36805245 88 nips-2013-Designed Measurements for Vector Count Data

17 0.36689278 218 nips-2013-On Sampling from the Gibbs Distribution with Random Maximum A-Posteriori Perturbations

18 0.36541891 359 nips-2013-Σ-Optimality for Active Learning on Gaussian Random Fields

19 0.36264026 163 nips-2013-Learning a Deep Compact Image Representation for Visual Tracking

20 0.35588786 80 nips-2013-Data-driven Distributionally Robust Polynomial Optimization

similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(16, 0.03), (32, 0.279), (33, 0.211), (34, 0.078), (41, 0.039), (49, 0.027), (56, 0.071), (70, 0.018), (85, 0.039), (89, 0.079), (93, 0.027), (95, 0.011)]

similar papers list:

simIndex simValue paperId paperTitle

1 0.86101967 47 nips-2013-Bayesian Hierarchical Community Discovery

Author: Charles Blundell, Yee Whye Teh

Abstract: We propose an efﬁcient Bayesian nonparametric model for discovering hierarchical community structure in social networks. Our model is a tree-structured mixture of potentially exponentially many stochastic blockmodels. We describe a family of greedy agglomerative model selection algorithms that take just one pass through the data to learn a fully probabilistic, hierarchical community model. In the worst case, Our algorithms scale quadratically in the number of vertices of the network, but independent of the number of nested communities. In practice, the run time of our algorithms are two orders of magnitude faster than the Inﬁnite Relational Model, achieving comparable or better accuracy. 1

same-paper 2 0.81985599 67 nips-2013-Conditional Random Fields via Univariate Exponential Families

Author: Eunho Yang, Pradeep Ravikumar, Genevera I. Allen, Zhandong Liu

3 0.6593588 83 nips-2013-Deep Fisher Networks for Large-Scale Image Classification

Author: Karen Simonyan, Andrea Vedaldi, Andrew Zisserman

Abstract: As massively parallel computations have become broadly available with modern GPUs, deep architectures trained on very large datasets have risen in popularity. Discriminatively trained convolutional neural networks, in particular, were recently shown to yield state-of-the-art performance in challenging image classiﬁcation benchmarks such as ImageNet. However, elements of these architectures are similar to standard hand-crafted representations used in computer vision. In this paper, we explore the extent of this analogy, proposing a version of the stateof-the-art Fisher vector image encoding that can be stacked in multiple layers. This architecture signiﬁcantly improves on standard Fisher vectors, and obtains competitive results with deep convolutional networks at a smaller computational learning cost. Our hybrid architecture allows us to assess how the performance of a conventional hand-crafted image classiﬁcation pipeline changes with increased depth. We also show that convolutional networks and Fisher vector encodings are complementary in the sense that their combination further improves the accuracy. 1

4 0.64728957 349 nips-2013-Visual Concept Learning: Combining Machine Vision and Bayesian Generalization on Concept Hierarchies

Author: Yangqing Jia, Joshua T. Abbott, Joseph Austerweil, Thomas Griffiths, Trevor Darrell

Abstract: Learning a visual concept from a small number of positive examples is a significant challenge for machine learning algorithms. Current methods typically fail to ﬁnd the appropriate level of generalization in a concept hierarchy for a given set of visual examples. Recent work in cognitive science on Bayesian models of generalization addresses this challenge, but prior results assumed that objects were perfectly recognized. We present an algorithm for learning visual concepts directly from images, using probabilistic predictions generated by visual classiﬁers as the input to a Bayesian generalization model. As no existing challenge data tests this paradigm, we collect and make available a new, large-scale dataset for visual concept learning using the ImageNet hierarchy as the source of possible concepts, with human annotators to provide ground truth labels as to whether a new image is an instance of each concept using a paradigm similar to that used in experiments studying word learning in children. We compare the performance of our system to several baseline algorithms, and show a signiﬁcant advantage results from combining visual classiﬁers with the ability to identify an appropriate level of abstraction using Bayesian generalization. 1

5 0.64692283 335 nips-2013-Transfer Learning in a Transductive Setting

Author: Marcus Rohrbach, Sandra Ebert, Bernt Schiele

Abstract: Category models for objects or activities typically rely on supervised learning requiring sufﬁciently large training sets. Transferring knowledge from known categories to novel classes with no or only a few labels is far less researched even though it is a common scenario. In this work, we extend transfer learning with semi-supervised learning to exploit unlabeled instances of (novel) categories with no or only a few labeled instances. Our proposed approach Propagated Semantic Transfer combines three techniques. First, we transfer information from known to novel categories by incorporating external knowledge, such as linguistic or expertspeciﬁed information, e.g., by a mid-level layer of semantic attributes. Second, we exploit the manifold structure of novel classes. More speciﬁcally we adapt a graph-based learning algorithm – so far only used for semi-supervised learning – to zero-shot and few-shot learning. Third, we improve the local neighborhood in such graph structures by replacing the raw feature-based representation with a mid-level object- or attribute-based representation. We evaluate our approach on three challenging datasets in two different applications, namely on Animals with Attributes and ImageNet for image classiﬁcation and on MPII Composites for activity recognition. Our approach consistently outperforms state-of-the-art transfer and semi-supervised approaches on all datasets. 1

6 0.6461584 331 nips-2013-Top-Down Regularization of Deep Belief Networks

7 0.64615816 237 nips-2013-Optimal integration of visual speed across different spatiotemporal frequency channels

8 0.6460309 341 nips-2013-Universal models for binary spike patterns using centered Dirichlet processes

9 0.64585364 302 nips-2013-Sparse Inverse Covariance Estimation with Calibration

10 0.64542121 190 nips-2013-Mid-level Visual Element Discovery as Discriminative Mode Seeking

11 0.64497149 356 nips-2013-Zero-Shot Learning Through Cross-Modal Transfer

12 0.64471561 93 nips-2013-Discriminative Transfer Learning with Tree-based Priors

13 0.6429382 20 nips-2013-Accelerating Stochastic Gradient Descent using Predictive Variance Reduction

14 0.64283824 279 nips-2013-Robust Bloom Filters for Large MultiLabel Classification Tasks

15 0.64272428 284 nips-2013-Robust Spatial Filtering with Beta Divergence

16 0.64272052 212 nips-2013-Non-Uniform Camera Shake Removal Using a Spatially-Adaptive Sparse Penalty

17 0.64258099 200 nips-2013-Multi-Prediction Deep Boltzmann Machines

18 0.6425339 254 nips-2013-Probabilistic Low-Rank Matrix Completion with Adaptive Spectral Regularization Algorithms

19 0.64169168 304 nips-2013-Sparse nonnegative deconvolution for compressive calcium imaging: algorithms and phase transitions

20 0.64150643 46 nips-2013-Bayesian Estimation of Latently-grouped Parameters in Undirected Graphical Models