nips nips2012 nips2012-172 knowledge-graph by maker-knowledge-mining

172 nips-2012-Latent Graphical Model Selection: Efficient Methods for Locally Tree-like Graphs

Source: pdf

Author: Anima Anandkumar, Ragupathyraj Valluvan

Abstract: Graphical model selection refers to the problem of estimating the unknown graph structure given observations at the nodes in the model. We consider a challenging instance of this problem when some of the nodes are latent or hidden. We characterize conditions for tractable graph estimation and develop efﬁcient methods with provable guarantees. We consider the class of Ising models Markov on locally tree-like graphs, which are in the regime of correlation decay. We propose an efﬁcient method for graph estimation, and establish its structural consistency −δη(η+1)−2 when the number of samples n scales as n = Ω(θmin log p), where θmin is the minimum edge potential, δ is the depth (i.e., distance from a hidden node to the nearest observed nodes), and η is a parameter which depends on the minimum and maximum node and edge potentials in the Ising model. The proposed method is practical to implement and provides ﬂexibility to control the number of latent variables and the cycle lengths in the output graph. We also present necessary conditions for graph estimation by any method and show that our method nearly matches the lower bound on sample requirements. Keywords: Graphical model selection, latent variables, quartet methods, locally tree-like graphs. 1

Reference: text

Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 edu Abstract Graphical model selection refers to the problem of estimating the unknown graph structure given observations at the nodes in the model. [sent-4, score-0.509]

2 We consider a challenging instance of this problem when some of the nodes are latent or hidden. [sent-5, score-0.593]

3 We propose an efﬁcient method for graph estimation, and establish its structural consistency −δη(η+1)−2 when the number of samples n scales as n = Ω(θmin log p), where θmin is the minimum edge potential, δ is the depth (i. [sent-8, score-0.662]

4 , distance from a hidden node to the nearest observed nodes), and η is a parameter which depends on the minimum and maximum node and edge potentials in the Ising model. [sent-10, score-0.855]

5 The proposed method is practical to implement and provides ﬂexibility to control the number of latent variables and the cycle lengths in the output graph. [sent-11, score-0.413]

6 We also present necessary conditions for graph estimation by any method and show that our method nearly matches the lower bound on sample requirements. [sent-12, score-0.448]

7 Keywords: Graphical model selection, latent variables, quartet methods, locally tree-like graphs. [sent-13, score-0.485]

8 1 Introduction It is widely recognized that the process of ﬁtting observed data to a statistical model needs to incorporate latent or hidden factors, which are not directly observed. [sent-14, score-0.545]

9 Learning latent variable models involves mainly two tasks: discovering structural relationships among the observed and hidden variables, and estimating the strength of such relationships. [sent-15, score-0.605]

10 One of the simplest models is the latent class model (LCM), which incorporates a single hidden variable and the observed variables are conditionally independent given the hidden variable. [sent-16, score-0.755]

11 Latent tree models extend this model class to incorporate many hidden variables in a hierarchical fashion. [sent-17, score-0.49]

12 Their computational tractability: upon learning the latent tree model, enables the inference to be carried out efﬁciently through belief propagation. [sent-19, score-0.508]

13 In this case, a latent tree model does not accurately represent the hierarchy of topics and words, since there are many common words across different topics. [sent-25, score-0.581]

14 Here, we relax the latent tree assumption to incorporate cycles in the latent graphical model while retaining many advantages of latent tree models, including tractable learning and inference. [sent-26, score-1.594]

15 Relaxing the tree constraint leads to many challenges: in general, learning these models is NP-hard, even when there are no latent variables, and developing tractable methods for such models is itself an area of active research, e. [sent-27, score-0.638]

16 We consider structure estimation in latent graphical models Markov on locally 1 tree-like graphs. [sent-30, score-0.587]

17 These extensions of latent tree models are relevant in many settings: for instance, when there is a small overlap among different hierarchies of variables, the resulting graph has mostly long cycles. [sent-31, score-0.732]

18 Are learning guarantees for loopy models comparable to those for latent trees? [sent-34, score-0.519]

19 How does learning depend on various graph attributes such as node degrees, girth of the graph, and so on? [sent-35, score-0.652]

20 Our Approach: We consider learning Ising models with latent variables Markov on locally treelike graphs. [sent-36, score-0.491]

21 Hence, we can employ the available latent tree methods to learn “local” subgraphs consistently, as long as they do not contain any cycles. [sent-39, score-0.566]

22 It is not clear whether an efﬁcient approach is possible for matching latent nodes during this process. [sent-43, score-0.593]

23 We employ a different philosophy for building locally tree-like graphs with latent variables. [sent-44, score-0.503]

24 We decouple the process of introducing cycles and latent variables in the output model. [sent-45, score-0.461]

25 We initialize a loopy graph consisting of only the observed variables, and then iteratively add latent variables to local neighborhoods of the graph. [sent-46, score-0.902]

26 We establish that our method is structurally consistent −δη(η+1)−2 when the number of samples n scales as n = Ω(θmin log p), where p is the number of observed variables, θmin is the minimum edge potential, δ is the depth (i. [sent-48, score-0.569]

27 , graph distance from a hidden node to the nearest observed nodes), and η is a parameter which depends on the minimum and maximum node and edge potentials of the Ising model (η = 1 for homogeneous models). [sent-50, score-1.038]

28 The sample requirement for our method is comparable to the requirement for many popular latent tree methods, e. [sent-51, score-0.651]

29 Moreover, note that when there are no hidden variables (δ = 1), the −2 sample complexity of our method is strengthened to n = Ω(θmin log p), which matches with the sample complexity of existing algorithms for learning fully-observed Ising models [5–7]. [sent-54, score-0.506]

30 Thus, we present an efﬁcient method which bridges structure estimation in latent trees with estimation in fully observed loopy graphical models. [sent-55, score-0.822]

31 Finally, we present necessary conditions for graph estimation by any method and show that our method nearly matches the lower bound. [sent-56, score-0.415]

32 Preliminary experiments on the newsgroup dataset suggests that the method can discover intuitive relationships efﬁciently, and also compares well with the popular latent Dirichlet allocation (LDA) [9] in terms of topic coherence and perplexity. [sent-58, score-0.467]

33 Our proposed method for learning loopy models is inspired by the efﬁcient latent tree learning algorithm of [4]. [sent-63, score-0.726]

34 This work makes the connection explicit: it relates the extent of correlation decay with the learning efﬁciency for latent models on large girth graphs. [sent-67, score-0.922]

35 This paper is the ﬁrst work to provide provable guarantees for learning discrete graphical models on loopy graphs with latent variables (which can also be easily extended to Gaussian models, see Remark following Theorem 1). [sent-69, score-0.803]

36 The work in [12] considers learning latent Gaussian graphical models using a convex relaxation method, by exploiting a sparse-low rank decomposition of the Gaussian precision matrix. [sent-70, score-0.43]

37 We say that a set of random variables XW := {Xi , i ∈ W } with probability mass function (pmf) P is Markov on the graph G if P (xi |xN (i) ) = P (xi |xW \i ) (1) holds for all nodes i ∈ W , where N (i) are the neighbors of node i in graph G. [sent-77, score-0.865]

38 We consider latent graphical models in which a subset of nodes is latent or hidden. [sent-82, score-1.023]

39 Let H ⊂ W denote the hidden nodes and V ⊂ W denote the observed nodes. [sent-83, score-0.522]

40 Our goal is to discover the presence of hidden variables XH and learn the unknown graph structure G(W ), given n i. [sent-84, score-0.447]

41 Let GGirth (m; g) denote the ensemble of graphs with girth at most g. [sent-92, score-0.418]

42 Regime of Correlation Decay: This work establishes tractable learning when the graphical model converges locally to a tree limit. [sent-97, score-0.459]

43 For the class of Ising models in (2), the regime of correlation decay can be explicitly characterized, in terms of the maximum edge potential θmax of the model and the maximum node degree ∆max . [sent-100, score-0.76]

44 3 Method, Guarantees and Necessary Conditions Background on Learning Latent Trees: Most latent tree learning methods are distance based, meaning they are based on the presence of an additive tree metric between any two nodes in the tree model. [sent-103, score-1.364]

45 For Ising model (and more generally, any discrete model), the “information” distance between any two nodes i and j in a tree T is deﬁned as d(i, j; T ) := − log | det(Pi,j )|, (3) where Pi,j denotes the joint probability distribution between nodes i and j. [sent-104, score-0.909]

46 Learning latent trees can thus be reformulated as learning tree structure T given end-to-end (estimated) distances d := {d(i, j) : i, j ∈ V } between the observed nodes V . [sent-106, score-1.036]

47 3 In [4], the so-called CLGrouping method is proposed, which organically grows the tree structure by adding latent nodes to local neighborhoods. [sent-111, score-0.915]

48 In the initial step, the method constructs the minimum spanning tree MST(V ; d) over the observed nodes V using distances d. [sent-112, score-0.739]

49 The method then iteratively visits local neighborhoods of MST(V ; d) and adds latent nodes by conducting local distance tests. [sent-113, score-0.909]

50 Since a tree structure is maintained in every iteration of the algorithm, we can parsimoniously add hidden variables by selecting neighborhoods which locally maximize scores such as the Bayesian information criterion (BIC) [8]. [sent-114, score-0.641]

51 This method also allows for fast implementation by parallelization of latent tree reconstruction in different neighborhoods, see [17] for details. [sent-115, score-0.606]

52 Proposed Algorithm: We now propose a method for learning loopy latent graphical models. [sent-116, score-0.566]

53 As in the case of latent tree methods, our method is also based on estimated information distances n dn (i, j; G) := − log | det(Pi,j )|, ∀i, j ∈ V, (4) n where Pi,j denotes the empirical probability distribution at nodes i and j computed using n i. [sent-117, score-0.989]

54 The presence of correlation decay in the Ising model implies that dn (i, j; G) is approximately a tree metric when nodes i and j are “close” on graph G (compared to the girth g of the graph). [sent-121, score-1.398]

55 Thus, intuitively, local neighborhoods of G can be constructed through latent tree methods. [sent-122, score-0.654]

56 However, the challenge is in merging these local estimates together to get a global estimate G: the presence of latent nodes in the local estimates makes merging challenging. [sent-123, score-0.801]

57 Moreover, such a merging-based method cannot easily incorporate global penalties for the number of latent variables added in the output model, which is relevant to obtain parsimonious representations on real datasets. [sent-124, score-0.42]

58 We overcome the above challenges as follows: our proposed method decouples the process of adding cycles and latent nodes to the output model. [sent-125, score-0.738]

59 It initializes a loopy graph G0 and then iteratively adds latent variables to local neighborhoods. [sent-126, score-0.753]

60 Given a parameter r > 0, for every node i ∈ V , consider the set of nodes Br (i; dn ) := {j : dn (i, j) < r}. [sent-127, score-0.581]

61 The initial graph estimate G0 is obtained by taking the union of local minimum spanning trees: G0 ← ∪i∈V MST(Br (i; dn ); dn ). [sent-128, score-0.437]

62 (5) 0 The method then adds latent variables by considering only local neighborhoods in G and running a latent tree reconstruction routine. [sent-129, score-1.105]

63 , distance from a hidden node to its closest observed nodes) and girth g of the graph. [sent-135, score-0.747]

64 When r is large enough, we obtain a latent tree, while for small r, the graph estimate can contain many short cycles (and potentially many components). [sent-137, score-0.584]

65 (A1) Minimum Degree of Latent Nodes: We require that all latent nodes have degree at least three, which is a natural assumption for identiﬁability of hidden variables. [sent-142, score-0.794]

66 Otherwise, the latent nodes can be marginalized to obtain an equivalent representation of the observed statistics. [sent-143, score-0.673]

67 We require α := ∆max tanh θmax < 1, 4 αg/2 η(η+1)+2 θmin = o(1), (7) where ∆max is the maximum node degree, g is the girth and θmin , θmax are the minimum and maximum (absolute) edge potentials in the model. [sent-147, score-0.799]

68 Given an Ising model P with edge potentials θ = {θi,j } and node potentials ¯ ¯ φ = {φi }, consider its attractive counterpart P with edge potentials θ := {|θi,j |} and node ¯ := {|φi |}. [sent-149, score-0.862]

69 Let P (X1,2 ; {θ, φ1 , φ2 }) denote an Ising model on two with respect to the distribution P nodes {1, 2} with edge potential θ and node potentials {φ1 , φ2 }. [sent-151, score-0.722]

70 dmin := − log|det P (X1,2 ; {θmax , φmax , φmax })|, dmax := − log|det P (X1,2 ; {θmin , 0, 0})|, η := dmax . [sent-153, score-0.5]

71 Depth: The depth δ characterizes how close the latent nodes are to observed nodes on graph G: for each hidden node h ∈ H, ﬁnd a set of four observed nodes which form the shortest quartet with h as one of the middle nodes, and consider the largest graph distance in that quartet. [sent-155, score-2.299]

72 We require the following tradeoff between the girth g and the depth δ: g − δη (η + 1) = ω(1), (8) 4 Further, the parameter r in our algorithm is chosen as r > δ (η + 1) dmax + , for some > 0, g dmin − r = ω(1). [sent-157, score-0.838]

73 4 (9) (A1) is a natural assumption on the minimum degree of the hidden nodes for identiﬁability. [sent-158, score-0.569]

74 It is natural that the sample requirement of any graph estimation algorithm depends on the “weakest” edge characterized by the minimum edge potential θmin . [sent-160, score-0.635]

75 , a small α and a large enough girth g) compared to the weakest edge in the model. [sent-165, score-0.477]

76 Similar conditions have been imposed before for graphical model selection in the regime of correlation decay when there are no hidden variables [5]. [sent-166, score-0.704]

77 Intuitively, dmin and dmax are bounds on information distances given by the local tree approximation of the loopy model. [sent-168, score-0.768]

78 (A5) provides the tradeoff between the girth g and the depth δ. [sent-170, score-0.511]

79 Intuitively, the depth needs to be smaller than the girth to avoid encountering cycles during the process of graph reconstruction. [sent-171, score-0.744]

80 It is chosen such that it is roughly larger than the depth δ in order for all the hidden nodes to be discovered. [sent-173, score-0.563]

81 The parameters for latent tree learning routines (such as conﬁdence intervals for quartet tests) are chosen appropriately depending on dmin and dmax , see [17] for details. [sent-175, score-0.93]

82 2 Guarantees We now provide the main result of this paper that the proposed method correctly estimates the graph structure of a loopy latent graphical model in high dimensions. [sent-177, score-0.774]

83 Recall that δ is the depth (distance from a hidden node to its closest observed nodes), θmin is the minimum (absolute) edge potential and η = dmax is the ratio of distance bounds. [sent-178, score-0.942]

84 (10) 5 Thus, for learning Ising models on locally tree-like graphs, the sample complexity is dependent both on the minimum edge potential θmin and on the depth δ. [sent-180, score-0.57]

85 Our method is efﬁcient in high dimensions since the sample requirement is only logarithmic in the number of nodes p. [sent-181, score-0.407]

86 Comparison with Fully Observed Models: In the special case when all the nodes are observed1 (δ = 1), we strengthen the results for our method and establish that the sample complexity is −2 n = Ω(θmin log p). [sent-184, score-0.532]

87 Comparison with Learning Latent Trees: Our method is an extension of latent tree methods for learning locally tree-like graphs. [sent-186, score-0.642]

88 The sample complexity of our method matches the sample requirements for learning general latent tree models [2–4]. [sent-187, score-0.764]

89 Thus, we establish that learning locally tree-like graphs is akin to learning latent trees in the regime of correlation decay. [sent-188, score-0.809]

90 The above results can also be easily extended to Gaussian models using the notion of walk-summability in place of correlation decay (see [18]) and the negative logarithm of the correlation coefﬁcient as the additive tree metric (see [4]). [sent-190, score-0.647]

91 Dependence on Fraction of Observed Nodes: In the special case when a fraction ρ of the nodes are uniformly selected as observed nodes, we can provide probabilistic bounds on the depth δ in the resulting latent model, see [17] for details. [sent-191, score-0.835]

92 Thus, we can characterize an explicit dependence on the fraction of observed nodes ρ. [sent-193, score-0.447]

93 3 Necessary Conditions for Graph Estimation We have so far provided sufﬁcient conditions for recovering locally tree-like graphs in latent Ising models. [sent-195, score-0.527]

94 Then the edit distance between G, G is deﬁned as dist(G, G; V ) := min ||AG − π(AG )||1 , π where π is any permutation on the unlabeled nodes while keeping the labeled nodes ﬁxed. [sent-203, score-0.866]

95 In our context, the labeled nodes correspond to the observed nodes V while the unlabeled nodes correspond to latent nodes H. [sent-205, score-1.622]

96 samples, where ρ ∈ [0, 1] is the fraction of observed nodes and m is 1 In the trivial case, when all the nodes are observed and the graph is locally tree-like, our method reduces to thresholding of information distances at each node, and building local MSTs. [sent-210, score-1.204]

97 2 We consider inexact graph matching where the unlabeled nodes can be unmatched. [sent-212, score-0.53]

98 6 the total number of nodes of an Ising model Markov on graph Gm ∈ GGirth (m; g, ∆min , ∆max ) on m nodes with girth g, minimum degree ∆min and maximum degree ∆max , for all > 0, we have 2nρm m(2 +1)m 3 m , m0. [sent-214, score-1.303]

99 4 Experiments We employ latent graphical models for topic modeling. [sent-225, score-0.504]

100 Methods: We consider a regularized variant of the proposed method for latent graphical model selection. [sent-232, score-0.425]

similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('girth', 0.331), ('nodes', 0.301), ('latent', 0.292), ('ising', 0.254), ('tree', 0.216), ('graph', 0.183), ('dmax', 0.173), ('bic', 0.163), ('decay', 0.154), ('dmin', 0.154), ('lda', 0.15), ('pmi', 0.142), ('hidden', 0.141), ('loopy', 0.141), ('node', 0.138), ('depth', 0.121), ('gm', 0.119), ('potentials', 0.118), ('edge', 0.116), ('cycles', 0.109), ('edit', 0.104), ('correlation', 0.104), ('perplexity', 0.101), ('neighborhoods', 0.101), ('locally', 0.098), ('regime', 0.098), ('graphical', 0.097), ('na', 0.095), ('quartet', 0.095), ('graphs', 0.087), ('xw', 0.086), ('trees', 0.083), ('observed', 0.08), ('ag', 0.078), ('xtest', 0.072), ('dn', 0.071), ('minimum', 0.067), ('mst', 0.063), ('newsgroup', 0.063), ('degree', 0.06), ('variables', 0.06), ('tradeoff', 0.059), ('distance', 0.057), ('min', 0.057), ('max', 0.055), ('df', 0.053), ('conditions', 0.05), ('det', 0.049), ('potential', 0.049), ('topic', 0.048), ('tractable', 0.048), ('ggirth', 0.047), ('establish', 0.047), ('unlabeled', 0.046), ('local', 0.045), ('complexity', 0.045), ('guarantees', 0.045), ('models', 0.041), ('fraction', 0.041), ('topics', 0.04), ('provable', 0.04), ('merging', 0.04), ('distances', 0.039), ('br', 0.038), ('presence', 0.038), ('matches', 0.038), ('necessary', 0.038), ('requirement', 0.037), ('dist', 0.036), ('structurally', 0.036), ('strengthen', 0.036), ('method', 0.036), ('estimation', 0.034), ('log', 0.034), ('words', 0.033), ('sample', 0.033), ('adds', 0.032), ('samples', 0.032), ('incorporate', 0.032), ('subgraphs', 0.032), ('reconstruction', 0.031), ('lbp', 0.031), ('parallelization', 0.031), ('requirements', 0.03), ('weakest', 0.03), ('tanh', 0.029), ('poly', 0.029), ('coherence', 0.028), ('additive', 0.028), ('irvine', 0.027), ('structural', 0.026), ('employ', 0.026), ('shortest', 0.026), ('learnt', 0.026), ('factorizes', 0.026), ('characterize', 0.025), ('cycle', 0.025), ('tests', 0.025), ('mainly', 0.025), ('structure', 0.025)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 1.0000004 172 nips-2012-Latent Graphical Model Selection: Efficient Methods for Locally Tree-like Graphs

Author: Anima Anandkumar, Ragupathyraj Valluvan

2 0.25757015 180 nips-2012-Learning Mixtures of Tree Graphical Models

Author: Anima Anandkumar, Furong Huang, Daniel J. Hsu, Sham M. Kakade

Abstract: We consider unsupervised estimation of mixtures of discrete graphical models, where the class variable is hidden and each mixture component can have a potentially different Markov graph structure and parameters over the observed variables. We propose a novel method for estimating the mixture components with provable guarantees. Our output is a tree-mixture model which serves as a good approximation to the underlying graphical model mixture. The sample and computational requirements for our method scale as poly(p, r), for an r-component mixture of pvariate graphical models, for a wide class of models which includes tree mixtures and mixtures over bounded degree graphs. Keywords: Graphical models, mixture models, spectral methods, tree approximation.

3 0.19787106 326 nips-2012-Structure estimation for discrete graphical models: Generalized covariance matrices and their inverses

Author: Po-ling Loh, Martin J. Wainwright

Abstract: We investigate a curious relationship between the structure of a discrete graphical model and the support of the inverse of a generalized covariance matrix. We show that for certain graph structures, the support of the inverse covariance matrix of indicator variables on the vertices of a graph reﬂects the conditional independence structure of the graph. Our work extends results that have previously been established only in the context of multivariate Gaussian graphical models, thereby addressing an open question about the signiﬁcance of the inverse covariance matrix of a non-Gaussian distribution. Based on our population-level results, we show how the graphical Lasso may be used to recover the edge structure of certain classes of discrete graphical models, and present simulations to verify our theoretical results. 1

4 0.13658768 19 nips-2012-A Spectral Algorithm for Latent Dirichlet Allocation

Author: Anima Anandkumar, Yi-kai Liu, Daniel J. Hsu, Dean P. Foster, Sham M. Kakade

Abstract: Topic modeling is a generalization of clustering that posits that observations (words in a document) are generated by multiple latent factors (topics), as opposed to just one. This increased representational power comes at the cost of a more challenging unsupervised learning problem of estimating the topic-word distributions when only words are observed, and the topics are hidden. This work provides a simple and efﬁcient learning procedure that is guaranteed to recover the parameters for a wide class of topic models, including Latent Dirichlet Allocation (LDA). For LDA, the procedure correctly recovers both the topic-word distributions and the parameters of the Dirichlet prior over the topic mixtures, using only trigram statistics (i.e., third order moments, which may be estimated with documents containing just three words). The method, called Excess Correlation Analysis, is based on a spectral decomposition of low-order moments via two singular value decompositions (SVDs). Moreover, the algorithm is scalable, since the SVDs are carried out only on k × k matrices, where k is the number of latent factors (topics) and is typically much smaller than the dimension of the observation (word) space. 1

5 0.13383266 339 nips-2012-The Time-Marginalized Coalescent Prior for Hierarchical Clustering

Author: Levi Boyles, Max Welling

Abstract: We introduce a new prior for use in Nonparametric Bayesian Hierarchical Clustering. The prior is constructed by marginalizing out the time information of Kingman’s coalescent, providing a prior over tree structures which we call the Time-Marginalized Coalescent (TMC). This allows for models which factorize the tree structure and times, providing two beneﬁts: more ﬂexible priors may be constructed and more efﬁcient Gibbs type inference can be used. We demonstrate this on an example model for density estimation and show the TMC achieves competitive experimental results. 1

6 0.13264829 147 nips-2012-Graphical Models via Generalized Linear Models

7 0.12249888 124 nips-2012-Factorial LDA: Sparse Multi-Dimensional Text Models

8 0.12177127 96 nips-2012-Density Propagation and Improved Bounds on the Partition Function

9 0.12149269 192 nips-2012-Learning the Dependency Structure of Latent Factors

10 0.11921919 69 nips-2012-Clustering Sparse Graphs

11 0.11835581 191 nips-2012-Learning the Architecture of Sum-Product Networks Using Clustering on Variables

12 0.11100725 260 nips-2012-Online Sum-Product Computation Over Trees

13 0.10843609 215 nips-2012-Minimizing Uncertainty in Pipelines

14 0.10703409 298 nips-2012-Scalable Inference of Overlapping Communities

15 0.1065333 335 nips-2012-The Bethe Partition Function of Log-supermodular Graphical Models

16 0.10586919 81 nips-2012-Context-Sensitive Decision Forests for Object Detection

17 0.10451657 10 nips-2012-A Linear Time Active Learning Algorithm for Link Classification

18 0.10432018 274 nips-2012-Priors for Diversity in Generative Latent Variable Models

19 0.10015754 156 nips-2012-Identifiability and Unmixing of Latent Parse Trees

20 0.096543543 334 nips-2012-Tensor Decomposition for Fast Parsing with Latent-Variable PCFGs

similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.245), (1, 0.097), (2, 0.002), (3, -0.104), (4, -0.183), (5, -0.081), (6, -0.039), (7, -0.163), (8, -0.259), (9, 0.113), (10, 0.051), (11, 0.064), (12, 0.098), (13, 0.006), (14, -0.071), (15, 0.061), (16, 0.118), (17, 0.11), (18, 0.077), (19, 0.101), (20, -0.032), (21, 0.054), (22, 0.061), (23, -0.065), (24, 0.048), (25, 0.028), (26, 0.017), (27, -0.051), (28, -0.002), (29, -0.047), (30, -0.0), (31, -0.005), (32, -0.09), (33, 0.001), (34, 0.034), (35, 0.016), (36, -0.039), (37, 0.014), (38, 0.035), (39, -0.082), (40, -0.034), (41, -0.024), (42, 0.012), (43, 0.022), (44, -0.037), (45, 0.08), (46, -0.099), (47, 0.104), (48, -0.001), (49, -0.034)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.97546458 172 nips-2012-Latent Graphical Model Selection: Efficient Methods for Locally Tree-like Graphs

Author: Anima Anandkumar, Ragupathyraj Valluvan

2 0.85589445 180 nips-2012-Learning Mixtures of Tree Graphical Models

Author: Anima Anandkumar, Furong Huang, Daniel J. Hsu, Sham M. Kakade

3 0.71371788 10 nips-2012-A Linear Time Active Learning Algorithm for Link Classification

Author: Nicolò Cesa-bianchi, Claudio Gentile, Fabio Vitale, Giovanni Zappella

Abstract: We present very eﬃcient active learning algorithms for link classiﬁcation in signed networks. Our algorithms are motivated by a stochastic model in which edge labels are obtained through perturbations of a initial sign assignment consistent with a two-clustering of the nodes. We provide a theoretical analysis within this model, showing that we can achieve an optimal (to whithin a constant factor) number of mistakes on any graph G = (V, E) such that |E| = Ω(|V |3/2 ) by querying O(|V |3/2 ) edge labels. More generally, we show an algorithm that achieves optimality to within a factor of O(k) by querying at most order of |V | + (|V |/k)3/2 edge labels. The running time of this algorithm is at most of order |E| + |V | log |V |. 1

4 0.68043733 215 nips-2012-Minimizing Uncertainty in Pipelines

Author: Nilesh Dalvi, Aditya Parameswaran, Vibhor Rastogi

Abstract: In this paper, we consider the problem of debugging large pipelines by human labeling. We represent the execution of a pipeline using a directed acyclic graph of AND and OR nodes, where each node represents a data item produced by some operator in the pipeline. We assume that each operator assigns a conﬁdence to each of its output data. We want to reduce the uncertainty in the output by issuing queries to a human, where a query consists of checking if a given data item is correct. In this paper, we consider the problem of asking the optimal set of queries to minimize the resulting output uncertainty. We perform a detailed evaluation of the complexity of the problem for various classes of graphs. We give efﬁcient algorithms for the problem for trees, and show that, for a general dag, the problem is intractable. 1

5 0.65324211 132 nips-2012-Fiedler Random Fields: A Large-Scale Spectral Approach to Statistical Network Modeling

Author: Antonino Freno, Mikaela Keller, Marc Tommasi

Abstract: Statistical models for networks have been typically committed to strong prior assumptions concerning the form of the modeled distributions. Moreover, the vast majority of currently available models are explicitly designed for capturing some speciﬁc graph properties (such as power-law degree distributions), which makes them unsuitable for application to domains where the behavior of the target quantities is not known a priori. The key contribution of this paper is twofold. First, we introduce the Fiedler delta statistic, based on the Laplacian spectrum of graphs, which allows to dispense with any parametric assumption concerning the modeled network properties. Second, we use the deﬁned statistic to develop the Fiedler random ﬁeld model, which allows for efﬁcient estimation of edge distributions over large-scale random networks. After analyzing the dependence structure involved in Fiedler random ﬁelds, we estimate them over several real-world networks, showing that they achieve a much higher modeling accuracy than other well-known statistical approaches.

6 0.6456002 192 nips-2012-Learning the Dependency Structure of Latent Factors

7 0.64255786 156 nips-2012-Identifiability and Unmixing of Latent Parse Trees

8 0.63216168 326 nips-2012-Structure estimation for discrete graphical models: Generalized covariance matrices and their inverses

9 0.62790841 260 nips-2012-Online Sum-Product Computation Over Trees

10 0.61034566 346 nips-2012-Topology Constraints in Graphical Models

11 0.59749615 339 nips-2012-The Time-Marginalized Coalescent Prior for Hierarchical Clustering

12 0.59400314 334 nips-2012-Tensor Decomposition for Fast Parsing with Latent-Variable PCFGs

13 0.57014936 232 nips-2012-Multiplicative Forests for Continuous-Time Processes

14 0.56010532 327 nips-2012-Structured Learning of Gaussian Graphical Models

15 0.55374622 147 nips-2012-Graphical Models via Generalized Linear Models

16 0.5509482 19 nips-2012-A Spectral Algorithm for Latent Dirichlet Allocation

17 0.53136885 317 nips-2012-Smooth-projected Neighborhood Pursuit for High-dimensional Nonparanormal Graph Estimation

18 0.51525527 287 nips-2012-Random function priors for exchangeable arrays with applications to graphs and relational data

19 0.51306677 207 nips-2012-Mandatory Leaf Node Prediction in Hierarchical Multilabel Classification

20 0.50635922 96 nips-2012-Density Propagation and Improved Bounds on the Partition Function

similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(0, 0.093), (21, 0.011), (36, 0.029), (38, 0.112), (39, 0.029), (42, 0.037), (53, 0.011), (54, 0.011), (55, 0.11), (74, 0.111), (76, 0.139), (80, 0.142), (92, 0.045), (98, 0.013)]

similar papers list:

simIndex simValue paperId paperTitle

1 0.91942614 274 nips-2012-Priors for Diversity in Generative Latent Variable Models

Author: James T. Kwok, Ryan P. Adams

Abstract: Probabilistic latent variable models are one of the cornerstones of machine learning. They offer a convenient and coherent way to specify prior distributions over unobserved structure in data, so that these unknown properties can be inferred via posterior inference. Such models are useful for exploratory analysis and visualization, for building density models of data, and for providing features that can be used for later discriminative tasks. A signiﬁcant limitation of these models, however, is that draws from the prior are often highly redundant due to i.i.d. assumptions on internal parameters. For example, there is no preference in the prior of a mixture model to make components non-overlapping, or in topic model to ensure that co-occurring words only appear in a small number of topics. In this work, we revisit these independence assumptions for probabilistic latent variable models, replacing the underlying i.i.d. prior with a determinantal point process (DPP). The DPP allows us to specify a preference for diversity in our latent variables using a positive deﬁnite kernel function. Using a kernel between probability distributions, we are able to deﬁne a DPP on probability measures. We show how to perform MAP inference with DPP priors in latent Dirichlet allocation and in mixture models, leading to better intuition for the latent variable representation and quantitatively improved unsupervised feature extraction, without compromising the generative aspects of the model. 1

same-paper 2 0.91527742 172 nips-2012-Latent Graphical Model Selection: Efficient Methods for Locally Tree-like Graphs

Author: Anima Anandkumar, Ragupathyraj Valluvan

3 0.90416998 52 nips-2012-Bayesian Nonparametric Modeling of Suicide Attempts

Author: Francisco Ruiz, Isabel Valera, Carlos Blanco, Fernando Pérez-Cruz

Abstract: The National Epidemiologic Survey on Alcohol and Related Conditions (NESARC) database contains a large amount of information, regarding the way of life, medical conditions, etc., of a representative sample of the U.S. population. In this paper, we are interested in seeking the hidden causes behind the suicide attempts, for which we propose to model the subjects using a nonparametric latent model based on the Indian Buffet Process (IBP). Due to the nature of the data, we need to adapt the observation model for discrete random variables. We propose a generative model in which the observations are drawn from a multinomial-logit distribution given the IBP matrix. The implementation of an efﬁcient Gibbs sampler is accomplished using the Laplace approximation, which allows integrating out the weighting factors of the multinomial-logit likelihood model. Finally, the experiments over the NESARC database show that our model properly captures some of the hidden causes that model suicide attempts. 1

4 0.90243286 215 nips-2012-Minimizing Uncertainty in Pipelines

Author: Nilesh Dalvi, Aditya Parameswaran, Vibhor Rastogi

5 0.89955986 168 nips-2012-Kernel Latent SVM for Visual Recognition

Author: Weilong Yang, Yang Wang, Arash Vahdat, Greg Mori

Abstract: Latent SVMs (LSVMs) are a class of powerful tools that have been successfully applied to many applications in computer vision. However, a limitation of LSVMs is that they rely on linear models. For many computer vision tasks, linear models are suboptimal and nonlinear models learned with kernels typically perform much better. Therefore it is desirable to develop the kernel version of LSVM. In this paper, we propose kernel latent SVM (KLSVM) – a new learning framework that combines latent SVMs and kernel methods. We develop an iterative training algorithm to learn the model parameters. We demonstrate the effectiveness of KLSVM using three different applications in visual recognition. Our KLSVM formulation is very general and can be applied to solve a wide range of applications in computer vision and machine learning. 1

6 0.89839166 158 nips-2012-ImageNet Classification with Deep Convolutional Neural Networks

7 0.89739174 193 nips-2012-Learning to Align from Scratch

8 0.89685988 197 nips-2012-Learning with Recursive Perceptual Representations

9 0.89539933 306 nips-2012-Semantic Kernel Forests from Multiple Taxonomies

10 0.8926512 210 nips-2012-Memorability of Image Regions

11 0.88756579 211 nips-2012-Meta-Gaussian Information Bottleneck

12 0.88713205 229 nips-2012-Multimodal Learning with Deep Boltzmann Machines

13 0.88485235 188 nips-2012-Learning from Distributions via Support Measure Machines

14 0.88080227 355 nips-2012-Truncation-free Online Variational Inference for Bayesian Nonparametric Models

15 0.87878096 104 nips-2012-Dual-Space Analysis of the Sparse Linear Model

16 0.87478083 298 nips-2012-Scalable Inference of Overlapping Communities

17 0.86936897 354 nips-2012-Truly Nonparametric Online Variational Inference for Hierarchical Dirichlet Processes

18 0.86797267 340 nips-2012-The representer theorem for Hilbert spaces: a necessary and sufficient condition

19 0.86687332 284 nips-2012-Q-MKL: Matrix-induced Regularization in Multi-Kernel Learning with Applications to Neuroimaging

20 0.86673504 159 nips-2012-Image Denoising and Inpainting with Deep Neural Networks