nips nips2012 nips2012-180 knowledge-graph by maker-knowledge-mining

180 nips-2012-Learning Mixtures of Tree Graphical Models

Source: pdf

Author: Anima Anandkumar, Furong Huang, Daniel J. Hsu, Sham M. Kakade

Abstract: We consider unsupervised estimation of mixtures of discrete graphical models, where the class variable is hidden and each mixture component can have a potentially different Markov graph structure and parameters over the observed variables. We propose a novel method for estimating the mixture components with provable guarantees. Our output is a tree-mixture model which serves as a good approximation to the underlying graphical model mixture. The sample and computational requirements for our method scale as poly(p, r), for an r-component mixture of pvariate graphical models, for a wide class of models which includes tree mixtures and mixtures over bounded degree graphs. Keywords: Graphical models, mixture models, spectral methods, tree approximation.

Reference: text

Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 com Abstract We consider unsupervised estimation of mixtures of discrete graphical models, where the class variable is hidden and each mixture component can have a potentially different Markov graph structure and parameters over the observed variables. [sent-7, score-1.243]

2 We propose a novel method for estimating the mixture components with provable guarantees. [sent-8, score-0.463]

3 The sample and computational requirements for our method scale as poly(p, r), for an r-component mixture of pvariate graphical models, for a wide class of models which includes tree mixtures and mixtures over bounded degree graphs. [sent-10, score-1.386]

4 Keywords: Graphical models, mixture models, spectral methods, tree approximation. [sent-11, score-0.642]

5 In other words, we posit the observed data as being generated from a mixture of graphical models, where each mixture component has a potentially different Markov graph structure and parameters. [sent-17, score-1.097]

6 The choice variable corresponding to the selection of the mixture component is hidden. [sent-18, score-0.478]

7 Such a class of graphical model mixtures can incorporate context-speciﬁc dependencies, and employs multiple graph structures to model the observed data. [sent-19, score-0.669]

8 Learning graphical model mixtures is however far more challenging than learning graphical models. [sent-21, score-0.671]

9 State-of-art theoretical guarantees are mostly limited to mixtures of product distributions, also known as latent class models or na¨ve Bayes models. [sent-22, score-0.4]

10 These models are restrictive since they do not ı allow for dependencies to exist among the observed variables in each mixture component. [sent-23, score-0.376]

11 Our work signiﬁcantly generalizes this class and allows for general Markov dependencies among the observed variables in each mixture component. [sent-24, score-0.346]

12 The output of our method is a tree mixture model, which is a good approximation for the underlying graphical model mixture. [sent-25, score-0.745]

13 The motivation behind ﬁtting the observed data to a tree mixture is clear: inference can be performed efﬁciently via belief propagation in each of the mixture components. [sent-26, score-0.898]

14 Thus, a tree mixture model offers a good tradeoff between using single-tree models, which are too simplistic, and general graphical model mixtures, where inference is not tractable. [sent-28, score-0.745]

15 1 Summary of Results We propose a novel method with provable guarantees for unsupervised estimation of discrete graphical model mixtures. [sent-30, score-0.346]

16 Our method has mainly three stages: graph structure estimation, parameter estimation, and tree approximation. [sent-31, score-0.397]

17 The ﬁrst stage involves estimation of the union graph structure G∪ := ∪h Gh , which is the union of the Markov graphs {Gh } of the respective mixture components. [sent-32, score-0.983]

18 Our method is based on a series of rank tests, and can be viewed as a generalization of conditionalindependence tests for graphical model selection (e. [sent-33, score-0.442]

19 We establish that our method is efﬁcient (in terms of computational and sample complexities), when the underlying union graph has sparse vertex separators. [sent-36, score-0.388]

20 This includes tree mixtures and mixtures with bounded degree graphs. [sent-37, score-0.856]

21 The second stage of our algorithm involves parameter estimation of the mixture components. [sent-38, score-0.39]

22 We provide conditions for tractable estimation of pairwise marginals of the mixture components. [sent-40, score-0.505]

23 Roughly, we exploit the conditional-independence relationships to convert the given model to a series of mixtures of product distributions. [sent-41, score-0.339]

24 Parameter estimation for product distribution mixture has been well studied (e. [sent-42, score-0.412]

25 We leverage on these techniques to obtain estimates of the pairwise marginals for each mixture component. [sent-45, score-0.454]

26 The ﬁnal stage for obtaining tree approximations involves running the standard Chow-Liu algorithm [10] on each component using the estimated pairwise marginals of the mixture components. [sent-46, score-0.865]

27 We prove that our method correctly recovers the union graph structure and the tree structures corresponding to maximum-likelihood tree approximations of the mixture components. [sent-47, score-1.185]

28 Note that if the underlying model is a tree mixture, we correctly recover the tree structures of the mixture components. [sent-48, score-0.823]

29 The sample and computational complexities of our method scale as poly(p, r), for an r-component mixture of p-variate graphical models, when the union graph has sparse vertex separators between any node pair. [sent-49, score-1.224]

30 This includes tree mixtures and mixtures with bounded degree graphs. [sent-50, score-0.856]

31 Our algorithm is also efﬁcient for practical implementation and some preliminary experiments suggest an advantage over EM with respect to running times and accuracy of structure estimation of the mixture components. [sent-52, score-0.358]

32 Thus, our approach for learning graphical model mixtures has both theoretical and practical implications. [sent-53, score-0.478]

33 2 Related Work Graphical Model Selection: Graphical model selection is a well studied problem starting from the seminal work of Chow and Liu [10] for ﬁnding the maximum-likelihood tree approximation of a graphical model. [sent-55, score-0.478]

34 However, these works are not directly applicable for learning mixtures of graphical models. [sent-58, score-0.478]

35 Moreover, our proposed method also provides a new approach for graphical model selection, in the special case when there is only one mixture component. [sent-59, score-0.5]

36 These works provide guarantees on recovery under various separation constraints between the mixture components and/or have computational and sample complexities growing exponentially in the number of mixture components r. [sent-63, score-0.892]

37 Spectral methods are applicable for parameter estimation in mixtures of discrete product distributions [7] and more generally for latent trees [8] and general linear multiview mixtures [9]. [sent-65, score-0.736]

38 2 2 Graphical Models and their Mixtures A graphical model is a family of multivariate distributions Markov on a given undirected graph [14]. [sent-67, score-0.345]

39 In a discrete graphical model, each node in the graph v ∈ V is associated with a random variable Yv taking value in a ﬁnite set Y. [sent-68, score-0.554]

40 Let H denote the discrete hidden choice variable corresponding to selection of a different mixture components, taking values in [r] := {1, . [sent-78, score-0.464]

41 Denote π H := [P (H = h)]h as the probability vector of the mixing weights and Gh as the Markov graph of the distribution P (y|H = h) of each mixture component. [sent-82, score-0.459]

42 , yn ] from P (y), our goal is to ﬁnd a tree approximation for each mixture component {P (y|H = h)}h . [sent-89, score-0.713]

43 We do not assume any knowledge of the mixing weights π H or Markov graphs {Gh }h or parameters of the mixture components {P (y|H = h)}h . [sent-90, score-0.453]

44 Moreover, since the variable H is latent, we do not a priori know the mixture component from which a sample is drawn. [sent-91, score-0.438]

45 First, we estimate the union graph G∪ := ∪r Gh , which is the union of the Markov graphs of the components. [sent-93, score-0.568]

46 We then h=1 use this graph estimate G∪ to obtain the pairwise marginals of the respective mixture components {P (y|H = h)}h . [sent-94, score-0.711]

47 Finally, Chow-Liu algorithm provides tree approximations {Th }h of each mixture components. [sent-95, score-0.587]

48 3 Estimation of the Union of Component Graphs We propose a novel method for learning graphical model mixtures by ﬁrst estimating the union graph G∪ = ∪r Gh , which is the union of the graphs of the components. [sent-96, score-1.083]

49 Intuitions: We ﬁrst establish the simple result that the union graph G∪ satisﬁes Markov property in each mixture component. [sent-100, score-0.634]

50 Recall that S(u, v; G∪ ) denotes a vertex separator between nodes u and v in G∪ . [sent-101, score-0.392]

51 ⊥ (1) Proof: The separator set in G∪ , denoted by S := S(u, v; G∪ ), is also a vertex separator for u and v in each of the component graphs Gh . [sent-103, score-0.716]

52 ¾ The above result can be exploited to obtain union graph estimate as follows: two nodes u, v are not neighbors in G∪ if a separator set S can be found which results in conditional independence, as in (1). [sent-106, score-0.744]

53 1 A set S(A, B; G) ⊂ V is a separator of sets A and B if the removal of nodes in S(A, B; G) separates A and B into distinct components. [sent-109, score-0.402]

54 When u and v are non-neighbors in G∪ , a separator set S can be found such that the rank of Mu,v,{S;k} is at most r. [sent-112, score-0.398]

55 Tractable Graph Families: Another obstacle in using Lemma 1 to estimate graph G∪ is computational: the search for separators S for any node pair u, v ∈ V is exponential in |V | := p if no further constraints are imposed. [sent-117, score-0.512]

56 We consider graph families where a vertex separator can be found for any (u, v) ∈ G∪ with size at most η. [sent-118, score-0.483]

57 , no edges) then η = 0, we have a mixture of product distributions. [sent-124, score-0.361]

58 , we have a mixture model Markov on the same tree, then η = 1, since there is a unique path between any two nodes on a tree. [sent-128, score-0.425]

59 For an arbitrary r-component tree mixture, G∪ = ∪h Th where each component is a tree distribution, we have that η ≤ r (since for any node pair, there is a unique path in each of the r trees {Th }, and separating the node pair in each Th also separates them on G∪ ). [sent-130, score-0.994]

60 For an arbitrary mixture of bounded degree graphs, we have η ≤ h∈[r] ∆h , where ∆h is the maximum degree in Gh , i. [sent-132, score-0.389]

61 Algorithm 1 Gn = RankTest(yn ; ξn,p , η, r) for estimating G∪ := ∪r Gh of an r-component ∪ h=1 mixture using yn samples, where η is the bound on size of vertex separators between any node pair in G∪ and ξn,p is a threshold on the singular values. [sent-138, score-0.831]

62 ∪ Rank Test: Based on the above observations, we propose a rank test to estimate G∪ := ∪h∈[r] Gh , the union graph in Algorithm 1. [sent-148, score-0.48]

63 The method is based on a search for potential separators S between n any two given nodes u, v ∈ V , based on the effective rank of Mu,v,{S;k} : if the effective rank is r or less, then u and v are declared as non-neighbors (and set S as their separator). [sent-149, score-0.567]

64 Thus, the method involves searching for separators for each node pair u, v ∈ V , by considering all sets S ⊂ V \ {u, v} satisfying |S| ≤ η. [sent-151, score-0.346]

65 This is because the number of rank tests performed is O(pη+2 ) over all node pairs and conditioning sets; each rank tests has O(d3 ) complexity since it involves performing singular value decomposition (SVD) of a d × d matrix. [sent-153, score-0.733]

66 This includes tree mixtures and mixtures over bounded degree graphs. [sent-158, score-0.856]

67 This ensures that the probability matrices for neighbors (u, v) ∈ G∪ have (effective) rank of at least r + 1, and thus, the rank test can correctly distinguish neighbors from non-neighbors. [sent-164, score-0.446]

68 We now provide the result on the success of recovering the union graph G∪ := ∪r Gh . [sent-170, score-0.368]

69 h=1 Theorem 1 (Success of Rank Tests) The RankTest(yn ; ξ, η, r) recovers the correct graph G∪ , which is the union of the component Markov graphs, under (A1)–(A3) with probability at least 1 − δ. [sent-171, score-0.426]

70 A special case of the above result is graphical model selection, where there is a single graphical model (r = 1) and we are interested in estimating its graph structure. [sent-172, score-0.575]

71 Remarks: Thus, the rank test is also applicable for graphical model selection. [sent-177, score-0.346]

72 The rank test above is thus an alternative test for conditional independence in graphical models, resulting in graph structure estimation. [sent-180, score-0.554]

73 In addition, it extends naturally to estimation of union graph structure of mixture components. [sent-181, score-0.685]

74 4 Parameter Estimation of Mixture Components Having obtained an estimate of the union graph G∪ , we now describe a procedure for estimating parameters of the mixture components {P (y|H = h)}. [sent-183, score-0.751]

75 Our method is based on spectral decomposition, proposed previously for mixtures of product distributions [7–9]. [sent-184, score-0.429]

76 Thus, if we have a general product distribution mixture over nodes in V , we can learn the parameters by performing the above spectral decomposition over different triplets {u, v, w}. [sent-205, score-0.68]

77 However, an obstacle remains: spectral decomposition over different triplets {u, v, w} results in different permutations of the labels of the hidden variable H. [sent-206, score-0.364]

78 Thus, if we consider a ﬁxed node u∗ ∈ V as the “left” node and use a ﬁxed matrix to diagonalize (5) for all triplets, we obtain a consistent ordering of the hidden labels over all triplet decompositions. [sent-208, score-0.409]

79 We ﬁrst make a simple observation on how to obtain mixtures of product distributions by considering separators on the union graph G∪ . [sent-210, score-0.806]

80 / (7) Thus, by ﬁxing the conﬁguration of nodes in Suvw , we obtain a product distribution mixture over {u, v, w}. [sent-215, score-0.447]

81 However, an additional complication arises when we consider graphical model mixtures, where conditioning over separators is required. [sent-221, score-0.364]

82 We require that the permutation of the hidden labels be unchanged upon conditioning over different values of variables in the separator set Su∗ vw . [sent-222, score-0.366]

83 This holds when the separator set Su∗ vw has no effect on node u∗ , i. [sent-223, score-0.427]

84 However, our setting is a signiﬁcant generalization over the mixtures of product distributions, where (8) is required to hold for all nodes. [sent-230, score-0.339]

85 Finally, since our goal is to estimate pairwise marginals of the mixture components, in place of node w in the triplet {u, v, w} in Lemma 2, we need to consider a node pair a, b ∈ V . [sent-231, score-0.839]

86 Thus, we obtain estimates of the pairwise marginals of the mixture components. [sent-233, score-0.454]

87 (9) Note that (A4) is a natural condition required for success of spectral decomposition and has been previously imposed for learning product distribution mixtures [7–9]. [sent-242, score-0.527]

88 Condition (A5) is required for learning product distribution mixtures [9], and we inherit it here. [sent-246, score-0.339]

89 We now provide guarantees for estimation of pairwise marginals of the mixture components. [sent-247, score-0.536]

90 Remark: Recall that p denotes the number of variables, r is the number of mixture components, d is the dimension of each node variable and η is the bound on separator sets between any node pair in the union graph. [sent-250, score-1.078]

91 Tree Approximation of Mixture Components: The ﬁnal step involves using the estimated pairwise marginals of each component {P spect (Ya , Yb |H = h)} to obtain tree approximation of the component via Chow-Liu algorithm [10]. [sent-254, score-0.703]

92 We now impose a standard condition of non-degeneracy on each mixture component to guarantee the existence of a unique tree structure corresponding to the maximum-likelihood tree approximation to the mixture component. [sent-255, score-1.203]

93 (A7) Number of Samples: Given tree deﬁned in [15], we require n > nspect(δ, tree ; p, d, r), (12) tree where nspect is given by (9). [sent-259, score-0.804]

94 Intuitively, provides the bound on distortion of the estimated pairwise marginals of the mixture components, required for correct estimation of tree approximations, and depends on ϑ in (11). [sent-260, score-0.75]

95 15 (b) Strong component edit distance (c) Weak component edit distance Figure 2: Classiﬁcation error and normalized edit distances of the proposed method, EM and EM initialized with the proposed method output on the tree mixture. [sent-274, score-0.904]

96 We generate samples from a mixture over two different trees (r = 2) with mixing weights π = [0. [sent-278, score-0.369]

97 Each mixture component is generated from the standard Potts model on p = 60 nodes, where the node variables are ternary (d = 3), and the number of samples n ∈ [2. [sent-281, score-0.584]

98 The joint distribution of nodes in each mixture component is given by   P (X|H = h) ∝ exp  Ji,j;h (I(Yi = Yj ) − 1) + i∈V (i,j)∈G Ki;h Yi ,  where I is the indicator function and Jh := {Ji,j;h } are the edge potentials in the model. [sent-283, score-0.523]

99 The node potentials are all set to zero (Ki;h = 0) except at the isolated node u∗ in the union graph. [sent-289, score-0.521]

100 However, our algorithm has signiﬁcantly superior performance with respect to the edit distance which is the error in estimating the tree structure in the two components, as seen in Fig 2. [sent-295, score-0.423]

similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('mixture', 0.307), ('mixtures', 0.285), ('gh', 0.28), ('yv', 0.274), ('tree', 0.245), ('separator', 0.245), ('graphical', 0.193), ('em', 0.177), ('union', 0.175), ('rank', 0.153), ('graph', 0.152), ('yb', 0.149), ('node', 0.145), ('edit', 0.141), ('separators', 0.14), ('ya', 0.133), ('ys', 0.129), ('suvw', 0.114), ('yw', 0.101), ('component', 0.099), ('spectral', 0.09), ('marginals', 0.087), ('nodes', 0.086), ('triplets', 0.086), ('yu', 0.082), ('spect', 0.081), ('components', 0.08), ('anandkumar', 0.077), ('th', 0.072), ('fig', 0.072), ('markov', 0.071), ('nspect', 0.069), ('ranktest', 0.069), ('graphs', 0.066), ('triplet', 0.066), ('yn', 0.062), ('vertex', 0.061), ('pairwise', 0.06), ('neighbors', 0.057), ('decomposition', 0.057), ('tests', 0.056), ('poly', 0.056), ('product', 0.054), ('hidden', 0.053), ('mw', 0.052), ('estimation', 0.051), ('complexities', 0.051), ('singular', 0.05), ('award', 0.047), ('obstacle', 0.046), ('removal', 0.046), ('disconnects', 0.046), ('recap', 0.046), ('ysuvw', 0.046), ('gn', 0.044), ('weak', 0.043), ('success', 0.041), ('degree', 0.041), ('selection', 0.04), ('hsu', 0.04), ('observed', 0.039), ('provable', 0.039), ('distances', 0.038), ('vw', 0.037), ('estimating', 0.037), ('separation', 0.036), ('approximations', 0.035), ('tw', 0.035), ('declared', 0.035), ('england', 0.035), ('samples', 0.033), ('mossel', 0.033), ('discrete', 0.032), ('path', 0.032), ('variable', 0.032), ('clarendon', 0.032), ('involves', 0.032), ('potentials', 0.031), ('conditioning', 0.031), ('chow', 0.031), ('guarantees', 0.031), ('models', 0.03), ('mv', 0.03), ('tan', 0.03), ('trees', 0.029), ('pair', 0.029), ('conditional', 0.029), ('mu', 0.029), ('intuitions', 0.028), ('independence', 0.027), ('xing', 0.027), ('likelihood', 0.027), ('irvine', 0.026), ('min', 0.026), ('correctly', 0.026), ('ising', 0.026), ('respective', 0.025), ('families', 0.025), ('isolated', 0.025), ('separates', 0.025)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.99999982 180 nips-2012-Learning Mixtures of Tree Graphical Models

Author: Anima Anandkumar, Furong Huang, Daniel J. Hsu, Sham M. Kakade

2 0.25757015 172 nips-2012-Latent Graphical Model Selection: Efficient Methods for Locally Tree-like Graphs

Author: Anima Anandkumar, Ragupathyraj Valluvan

Abstract: Graphical model selection refers to the problem of estimating the unknown graph structure given observations at the nodes in the model. We consider a challenging instance of this problem when some of the nodes are latent or hidden. We characterize conditions for tractable graph estimation and develop efﬁcient methods with provable guarantees. We consider the class of Ising models Markov on locally tree-like graphs, which are in the regime of correlation decay. We propose an efﬁcient method for graph estimation, and establish its structural consistency −δη(η+1)−2 when the number of samples n scales as n = Ω(θmin log p), where θmin is the minimum edge potential, δ is the depth (i.e., distance from a hidden node to the nearest observed nodes), and η is a parameter which depends on the minimum and maximum node and edge potentials in the Ising model. The proposed method is practical to implement and provides ﬂexibility to control the number of latent variables and the cycle lengths in the output graph. We also present necessary conditions for graph estimation by any method and show that our method nearly matches the lower bound on sample requirements. Keywords: Graphical model selection, latent variables, quartet methods, locally tree-like graphs. 1

3 0.20680694 326 nips-2012-Structure estimation for discrete graphical models: Generalized covariance matrices and their inverses

Author: Po-ling Loh, Martin J. Wainwright

Abstract: We investigate a curious relationship between the structure of a discrete graphical model and the support of the inverse of a generalized covariance matrix. We show that for certain graph structures, the support of the inverse covariance matrix of indicator variables on the vertices of a graph reﬂects the conditional independence structure of the graph. Our work extends results that have previously been established only in the context of multivariate Gaussian graphical models, thereby addressing an open question about the signiﬁcance of the inverse covariance matrix of a non-Gaussian distribution. Based on our population-level results, we show how the graphical Lasso may be used to recover the edge structure of certain classes of discrete graphical models, and present simulations to verify our theoretical results. 1

4 0.15553878 40 nips-2012-Analyzing 3D Objects in Cluttered Images

Author: Mohsen Hejrati, Deva Ramanan

Abstract: We present an approach to detecting and analyzing the 3D conﬁguration of objects in real-world images with heavy occlusion and clutter. We focus on the application of ﬁnding and analyzing cars. We do so with a two-stage model; the ﬁrst stage reasons about 2D shape and appearance variation due to within-class variation (station wagons look different than sedans) and changes in viewpoint. Rather than using a view-based model, we describe a compositional representation that models a large number of effective views and shapes using a small number of local view-based templates. We use this model to propose candidate detections and 2D estimates of shape. These estimates are then reﬁned by our second stage, using an explicit 3D model of shape and viewpoint. We use a morphable model to capture 3D within-class variation, and use a weak-perspective camera model to capture viewpoint. We learn all model parameters from 2D annotations. We demonstrate state-of-the-art accuracy for detection, viewpoint estimation, and 3D shape reconstruction on challenging images from the PASCAL VOC 2011 dataset. 1

5 0.14728986 260 nips-2012-Online Sum-Product Computation Over Trees

Author: Mark Herbster, Stephen Pasteris, Fabio Vitale

Abstract: We consider the problem of performing efﬁcient sum-product computations in an online setting over a tree. A natural application of our methods is to compute the marginal distribution at a vertex in a tree-structured Markov random ﬁeld. Belief propagation can be used to solve this problem, but requires time linear in the size of the tree, and is therefore too slow in an online setting where we are continuously receiving new data and computing individual marginals. With our method we aim to update the data and compute marginals in time that is no more than logarithmic in the size of the tree, and is often signiﬁcantly less. We accomplish this via a hierarchical covering structure that caches previous local sum-product computations. Our contribution is three-fold: we i) give a linear time algorithm to ﬁnd an optimal hierarchical cover of a tree; ii) give a sum-productlike algorithm to efﬁciently compute marginals with respect to this cover; and iii) apply “i” and “ii” to ﬁnd an efﬁcient algorithm with a regret bound for the online allocation problem in a multi-task setting. 1

6 0.13927853 294 nips-2012-Repulsive Mixtures

7 0.13377896 339 nips-2012-The Time-Marginalized Coalescent Prior for Hierarchical Clustering

8 0.12419671 19 nips-2012-A Spectral Algorithm for Latent Dirichlet Allocation

9 0.12155712 156 nips-2012-Identifiability and Unmixing of Latent Parse Trees

10 0.11828022 147 nips-2012-Graphical Models via Generalized Linear Models

11 0.11171574 335 nips-2012-The Bethe Partition Function of Log-supermodular Graphical Models

12 0.10795928 199 nips-2012-Link Prediction in Graphs with Autoregressive Features

13 0.10049325 334 nips-2012-Tensor Decomposition for Fast Parsing with Latent-Variable PCFGs

14 0.092749536 342 nips-2012-The variational hierarchical EM algorithm for clustering hidden Markov models

15 0.091168351 96 nips-2012-Density Propagation and Improved Bounds on the Partition Function

16 0.091097079 206 nips-2012-Majorization for CRFs and Latent Likelihoods

17 0.090235122 281 nips-2012-Provable ICA with Unknown Gaussian Noise, with Implications for Gaussian Mixtures and Autoencoders

18 0.089858264 81 nips-2012-Context-Sensitive Decision Forests for Object Detection

19 0.085910432 6 nips-2012-A Convex Formulation for Learning Scale-Free Networks via Submodular Relaxation

20 0.084752768 316 nips-2012-Small-Variance Asymptotics for Exponential Family Dirichlet Process Mixture Models

similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.226), (1, 0.087), (2, 0.037), (3, -0.096), (4, -0.152), (5, -0.088), (6, -0.014), (7, -0.13), (8, -0.237), (9, 0.12), (10, 0.035), (11, 0.01), (12, 0.068), (13, -0.035), (14, -0.06), (15, 0.026), (16, 0.113), (17, 0.059), (18, 0.072), (19, 0.018), (20, -0.073), (21, 0.091), (22, -0.01), (23, 0.032), (24, -0.02), (25, 0.11), (26, 0.017), (27, -0.107), (28, -0.036), (29, 0.01), (30, -0.008), (31, -0.018), (32, -0.054), (33, -0.046), (34, 0.047), (35, 0.009), (36, 0.008), (37, -0.066), (38, 0.079), (39, 0.007), (40, -0.048), (41, -0.105), (42, -0.041), (43, 0.026), (44, -0.068), (45, 0.066), (46, -0.011), (47, 0.111), (48, 0.025), (49, -0.103)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.97481483 180 nips-2012-Learning Mixtures of Tree Graphical Models

Author: Anima Anandkumar, Furong Huang, Daniel J. Hsu, Sham M. Kakade

2 0.86084443 172 nips-2012-Latent Graphical Model Selection: Efficient Methods for Locally Tree-like Graphs

Author: Anima Anandkumar, Ragupathyraj Valluvan

3 0.72157311 156 nips-2012-Identifiability and Unmixing of Latent Parse Trees

Author: Percy Liang, Daniel J. Hsu, Sham M. Kakade

Abstract: This paper explores unsupervised learning of parsing models along two directions. First, which models are identiﬁable from inﬁnite data? We use a general technique for numerically checking identiﬁability based on the rank of a Jacobian matrix, and apply it to several standard constituency and dependency parsing models. Second, for identiﬁable models, how do we estimate the parameters efﬁciently? EM suffers from local optima, while recent work using spectral methods [1] cannot be directly applied since the topology of the parse tree varies across sentences. We develop a strategy, unmixing, which deals with this additional complexity for restricted classes of parsing models. 1

4 0.69011199 10 nips-2012-A Linear Time Active Learning Algorithm for Link Classification

Author: Nicolò Cesa-bianchi, Claudio Gentile, Fabio Vitale, Giovanni Zappella

Abstract: We present very eﬃcient active learning algorithms for link classiﬁcation in signed networks. Our algorithms are motivated by a stochastic model in which edge labels are obtained through perturbations of a initial sign assignment consistent with a two-clustering of the nodes. We provide a theoretical analysis within this model, showing that we can achieve an optimal (to whithin a constant factor) number of mistakes on any graph G = (V, E) such that |E| = Ω(|V |3/2 ) by querying O(|V |3/2 ) edge labels. More generally, we show an algorithm that achieves optimality to within a factor of O(k) by querying at most order of |V | + (|V |/k)3/2 edge labels. The running time of this algorithm is at most of order |E| + |V | log |V |. 1

5 0.67279106 260 nips-2012-Online Sum-Product Computation Over Trees

Author: Mark Herbster, Stephen Pasteris, Fabio Vitale

6 0.65143257 215 nips-2012-Minimizing Uncertainty in Pipelines

7 0.64016253 339 nips-2012-The Time-Marginalized Coalescent Prior for Hierarchical Clustering

8 0.61188102 326 nips-2012-Structure estimation for discrete graphical models: Generalized covariance matrices and their inverses

9 0.60653192 132 nips-2012-Fiedler Random Fields: A Large-Scale Spectral Approach to Statistical Network Modeling

10 0.59418422 334 nips-2012-Tensor Decomposition for Fast Parsing with Latent-Variable PCFGs

11 0.57702166 346 nips-2012-Topology Constraints in Graphical Models

12 0.56378073 96 nips-2012-Density Propagation and Improved Bounds on the Partition Function

13 0.55693924 232 nips-2012-Multiplicative Forests for Continuous-Time Processes

14 0.54688019 192 nips-2012-Learning the Dependency Structure of Latent Factors

15 0.52720976 147 nips-2012-Graphical Models via Generalized Linear Models

16 0.52130425 19 nips-2012-A Spectral Algorithm for Latent Dirichlet Allocation

17 0.51810652 206 nips-2012-Majorization for CRFs and Latent Likelihoods

18 0.50792044 294 nips-2012-Repulsive Mixtures

19 0.50642008 207 nips-2012-Mandatory Leaf Node Prediction in Hierarchical Multilabel Classification

20 0.49976957 183 nips-2012-Learning Partially Observable Models Using Temporally Abstract Decision Trees

similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(0, 0.063), (14, 0.015), (21, 0.014), (38, 0.122), (39, 0.018), (42, 0.033), (54, 0.016), (55, 0.013), (74, 0.121), (76, 0.147), (80, 0.112), (92, 0.029), (98, 0.221)]

similar papers list:

simIndex simValue paperId paperTitle

1 0.91789699 154 nips-2012-How They Vote: Issue-Adjusted Models of Legislative Behavior

Author: Sean Gerrish, David M. Blei

Abstract: We develop a probabilistic model of legislative data that uses the text of the bills to uncover lawmakers’ positions on speciﬁc political issues. Our model can be used to explore how a lawmaker’s voting patterns deviate from what is expected and how that deviation depends on what is being voted on. We derive approximate posterior inference algorithms based on variational methods. Across 12 years of legislative data, we demonstrate both improvement in heldout predictive performance and the model’s utility in interpreting an inherently multi-dimensional space. 1

2 0.86973786 267 nips-2012-Perceptron Learning of SAT

Author: Alex Flint, Matthew Blaschko

Abstract: Boolean satisﬁability (SAT) as a canonical NP-complete decision problem is one of the most important problems in computer science. In practice, real-world SAT sentences are drawn from a distribution that may result in efﬁcient algorithms for their solution. Such SAT instances are likely to have shared characteristics and substructures. This work approaches the exploration of a family of SAT solvers as a learning problem. In particular, we relate polynomial time solvability of a SAT subset to a notion of margin between sentences mapped by a feature function into a Hilbert space. Provided this mapping is based on polynomial time computable statistics of a sentence, we show that the existance of a margin between these data points implies the existance of a polynomial time solver for that SAT subset based on the Davis-Putnam-Logemann-Loveland algorithm. Furthermore, we show that a simple perceptron-style learning rule will ﬁnd an optimal SAT solver with a bounded number of training updates. We derive a linear time computable set of features and show analytically that margins exist for important polynomial special cases of SAT. Empirical results show an order of magnitude improvement over a state-of-the-art SAT solver on a hardware veriﬁcation task. 1

3 0.83181894 55 nips-2012-Bayesian Warped Gaussian Processes

Author: Miguel Lázaro-gredilla

Abstract: Warped Gaussian processes (WGP) [1] model output observations in regression tasks as a parametric nonlinear transformation of a Gaussian process (GP). The use of this nonlinear transformation, which is included as part of the probabilistic model, was shown to enhance performance by providing a better prior model on several data sets. In order to learn its parameters, maximum likelihood was used. In this work we show that it is possible to use a non-parametric nonlinear transformation in WGP and variationally integrate it out. The resulting Bayesian WGP is then able to work in scenarios in which the maximum likelihood WGP failed: Low data regime, data with censored values, classiﬁcation, etc. We demonstrate the superior performance of Bayesian warped GPs on several real data sets.

4 0.82586884 356 nips-2012-Unsupervised Structure Discovery for Semantic Analysis of Audio

Author: Sourish Chaudhuri, Bhiksha Raj

Abstract: Approaches to audio classiﬁcation and retrieval tasks largely rely on detectionbased discriminative models. We submit that such models make a simplistic assumption in mapping acoustics directly to semantics, whereas the actual process is likely more complex. We present a generative model that maps acoustics in a hierarchical manner to increasingly higher-level semantics. Our model has two layers with the ﬁrst layer modeling generalized sound units with no clear semantic associations, while the second layer models local patterns over these sound units. We evaluate our model on a large-scale retrieval task from TRECVID 2011, and report signiﬁcant improvements over standard baselines. 1

same-paper 5 0.82117301 180 nips-2012-Learning Mixtures of Tree Graphical Models

Author: Anima Anandkumar, Furong Huang, Daniel J. Hsu, Sham M. Kakade

6 0.76263976 274 nips-2012-Priors for Diversity in Generative Latent Variable Models

7 0.76040155 174 nips-2012-Learning Halfspaces with the Zero-One Loss: Time-Accuracy Tradeoffs

8 0.74380273 168 nips-2012-Kernel Latent SVM for Visual Recognition

9 0.737813 172 nips-2012-Latent Graphical Model Selection: Efficient Methods for Locally Tree-like Graphs

10 0.73694432 339 nips-2012-The Time-Marginalized Coalescent Prior for Hierarchical Clustering

11 0.73538077 3 nips-2012-A Bayesian Approach for Policy Learning from Trajectory Preference Queries

12 0.73483032 176 nips-2012-Learning Image Descriptors with the Boosting-Trick

13 0.73395622 210 nips-2012-Memorability of Image Regions

14 0.73078394 8 nips-2012-A Generative Model for Parts-based Object Segmentation

15 0.72816283 355 nips-2012-Truncation-free Online Variational Inference for Bayesian Nonparametric Models

16 0.72351539 260 nips-2012-Online Sum-Product Computation Over Trees

17 0.7232694 303 nips-2012-Searching for objects driven by context

18 0.72288781 83 nips-2012-Controlled Recognition Bounds for Visual Learning and Exploration

19 0.72047907 316 nips-2012-Small-Variance Asymptotics for Exponential Family Dirichlet Process Mixture Models

20 0.72034395 209 nips-2012-Max-Margin Structured Output Regression for Spatio-Temporal Action Localization