nips nips2007 nips2007-31 knowledge-graph by maker-knowledge-mining

31 nips-2007-Bayesian Agglomerative Clustering with Coalescents

Source: pdf

Author: Yee W. Teh, Daniel J. Hsu, Hal Daume

Abstract: We introduce a new Bayesian model for hierarchical clustering based on a prior over trees called Kingman’s coalescent. We develop novel greedy and sequential Monte Carlo inferences which operate in a bottom-up agglomerative fashion. We show experimentally the superiority of our algorithms over the state-of-the-art, and demonstrate our approach in document clustering and phylolinguistics. 1

Reference: text

Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 edu Abstract We introduce a new Bayesian model for hierarchical clustering based on a prior over trees called Kingman’s coalescent. [sent-6, score-0.302]

2 We develop novel greedy and sequential Monte Carlo inferences which operate in a bottom-up agglomerative fashion. [sent-7, score-0.269]

3 We show experimentally the superiority of our algorithms over the state-of-the-art, and demonstrate our approach in document clustering and phylolinguistics. [sent-8, score-0.121]

4 It is thus not surprising that hierarchical clustering is a traditional mainstay of machine learning [1]. [sent-10, score-0.175]

5 The dominant approach to hierarchical clustering is agglomerative: start with one cluster per datum, and greedily merge pairs until a single cluster remains. [sent-11, score-0.26]

6 Currently there are two main approaches to probabilistic models for hierarchical clustering. [sent-14, score-0.083]

7 The ﬁrst takes a direct Bayesian approach by deﬁning a prior over trees followed by a distribution over data points conditioned on a tree [2, 3, 4, 5]. [sent-15, score-0.234]

8 MCMC sampling is then used to obtain trees from their posterior distribution given observations. [sent-16, score-0.143]

9 The second approach uses a ﬂat mixture model as the underlying probabilistic model and structures the posterior hierarchically [6, 7]. [sent-20, score-0.094]

10 This approach uses an agglomerative procedure to ﬁnd the tree giving the best posterior approximation, mirroring traditional agglomerative clustering techniques closely and giving efﬁcient and easy to implement algorithms. [sent-21, score-0.595]

11 However because the underlying model has no hierarchical structure, there is no sharing of information across the tree. [sent-22, score-0.083]

12 We propose a novel class of Bayesian hierarchical clustering models and associated inference algorithms combining the advantages of both probabilistic approaches above. [sent-23, score-0.204]

13 Our model is based on an exchangeable distribution over trees called Kingman’s coalescent [8, 9]. [sent-25, score-0.727]

14 Kingman’s coalescent is a standard model from population genetics for the genealogy of a set of individuals. [sent-26, score-0.87]

15 It is obtained by tracing the genealogy backwards in time, noting when lineages coalesce together. [sent-27, score-0.486]

16 Our own contribution is in using it as a prior over trees in a hierarchical clustering model (Section 3) and in developing novel inference procedures for this model (Section 4). [sent-29, score-0.302]

17 (b) Sample path from a Brownian diffusion coalescent process in 1D, circles are coalescent points. [sent-62, score-1.271]

18 2 Kingman’s coalescent Kingman’s coalescent is a standard model in population genetics describing the common genealogy (ancestral tree) of a set of individuals [8, 9]. [sent-64, score-1.527]

19 In its full form it is a distribution over the genealogy of a countably inﬁnite set of individuals. [sent-65, score-0.274]

20 Gaussian and Dirichlet processes), Kingman’s coalescent is most easily described and understood in terms of its ﬁnite dimensional marginal distributions over the genealogies of n individuals, called n-coalescents. [sent-68, score-0.664]

21 Consider the genealogy of n individuals alive at the present time t = 0. [sent-70, score-0.351]

22 Assume each individual has one parent (in genetics, haploid organisms), and therefore genealogies of [n] = {1, . [sent-72, score-0.108]

23 Further, π completely and succinctly characterizes the genealogy; we shall henceforth refer to π as the genealogy of [n]. [sent-89, score-0.249]

24 Kingman’s n-coalescent is simply a distribution over genealogies of [n], or equivalently, over the space of partition-valued functions like π. [sent-90, score-0.083]

25 , {n}} at present time t = 0, and evolves backwards in time, merging (coalescing) lineages until only one is left. [sent-94, score-0.2]

26 To describe the Markov process in its entirety, it is sufﬁcient to describe the jump process (i. [sent-95, score-0.09]

27 the embedded, discrete-time, Markov chain over partitions) and the distribution over coalescent times. [sent-97, score-0.581]

28 Let ρli , ρri be the ith pair of lineages to coalesce, tn−1 < · · ·< t1 < t0 = 0 be the coalescent times and δi = ti−1 −ti > 0 be the duration between adjacent events (see Figure 1a). [sent-99, score-0.833]

29 Under the n-coalescent, every pair of lineages merges independently with exponential rate 1. [sent-100, score-0.154]

30 Thus the ﬁrst pair amongst m lineages merge with rate m = m(m−1) . [sent-101, score-0.183]

31 Therefore δi ∼ Exp n−i+1 independently, the pair ρli , ρri is chosen from 2 2 2 among those right after time ti , and with probability one a random draw from the n-coalescent is a binary tree with a single root at t = −∞ and the n individuals at time t = 0. [sent-102, score-0.367]

32 , {n}} π(t) = πti−1 − ρli − ρri + (ρli ∪ ρri ) if t = ti ; (1)  π ti if ti+1 < t < ti . [sent-106, score-0.36]

33 The marginal distribution over tree topologies is uniform and independent of the coalescent times. [sent-108, score-0.688]

34 Secondly, it is inﬁnitely exchangeable: given a genealogy drawn from an n-coalescent, the genealogy of any m contemporary individuals alive at time t ≤ 0 embedded within the genealogy is a draw from the m-coalescent. [sent-109, score-0.875]

35 Thus, taking n → ∞, there is a distribution over genealogies of a countably inﬁnite population for which the marginal distribution of the genealogy of any n individuals gives the n-coalescent. [sent-110, score-0.433]

36 3 Hierarchical clustering with coalescents We take a Bayesian approach to hierarchical clustering, placing a coalescent prior on the latent tree and modeling the observed data with a tree structured Markov process evolving forward in time. [sent-112, score-1.161]

37 We will alter our terminology from genealogy to tree, from n individuals at present time to n observed data points, and from individuals on the genealogy to latent variables on the tree-structured distribution. [sent-113, score-0.679]

38 , xn } be n observed data points at the leaves of a tree π drawn from the n-coalescent. [sent-117, score-0.14]

39 π has n − 1 coalescent points, the ith occuring when ρli and ρri merge at time ti to form ρi = ρli ∪ ρri . [sent-118, score-0.793]

40 Let tli and tri be the times at which ρli and ρri are themselves formed. [sent-119, score-0.266]

41 We use a continuous-time Markov process to deﬁne the distribution over the n data points x given the tree π. [sent-120, score-0.152]

42 The Markov process starts in the distant past, evolves forward in time, splits at each coalescent point, and evolves independently down both branches until we reach time 0, when n data points are observations of the process at the n leaves of the tree. [sent-121, score-0.83]

43 The joint distribution described by this process respects the conditional independences implied by the structure of the directed tree π. [sent-122, score-0.152]

44 To complete the description of the likelihood model, let q(z) be the initial distribution of the Markov process at time t = −∞, and kst (x, y) be the transition probability from state x at time s to state y at time t. [sent-125, score-0.078]

45 However the observations are not independent as they share the same sample path down the Markov process until it splits. [sent-128, score-0.079]

46 3 a brownian diffusion (Figures 1(b,c)) and a simple independent sites mutation process on multinomial vectors. [sent-133, score-0.313]

47 4 Agglomerative sequential Monte Carlo and greedy inference We develop two classes of efﬁcient and easily implementable inference algorithms for our hierarchical clustering model based on sequential Monte Carlo (SMC) and greedy schemes respectively. [sent-134, score-0.429]

48 In both classes, the latent variables are integrated out, and the trees are constructed in a bottom-up fashion. [sent-135, score-0.122]

49 The full tree π can be expressed as a series of n − 1 coalescent events, ordered backwards in time. [sent-136, score-0.726]

50 The ith coalescent event involves the merging of the two subtrees with leaves ρli and ρri and occurs at a time δi before the previous coalescent event. [sent-137, score-1.257]

51 Let θi = {δj , ρlj , ρrj for j ≤ i} denote the ﬁrst i coalescent events. [sent-138, score-0.581]

52 Mρi (y) is proportional to the likelihood of the observations at the leaves below coalescent event i, given that yρi = y. [sent-142, score-0.648]

53 The sequential Monte Carlo (SMC) algorithms approximate the posterior over the tree θn−1 using a weighted sum of samples, while the greedy algorithms construct θn−1 by maximizing local terms in (7). [sent-150, score-0.311]

54 , n − 1, choosing a duration δi and a pair of subtrees ρli , ρri to coalesce at each iteration. [sent-154, score-0.207]

55 This choice is based upon the ith term in (7), interpreted as the product of a local prior exp − n−i+1 δi and a 2 local likelihood Zρi (x, θi ) for choosing δi , ρli and ρri given θi−1 . [sent-155, score-0.097]

56 1 Sequential Monte Carlo algorithms SMC algorithms approximate the posterior by iteratively constructing a weighted sum of point s s masses. [sent-157, score-0.108]

57 The simplest proposal distribution is to sample δi , ρs and ρs from the local ri li s prior. [sent-165, score-0.457]

58 δi is drawn from an exponential with rate n−i+1 and ρs , ρs are drawn uniformly from li ri 2 s all available pairs. [sent-166, score-0.457]

59 Fortunately, for the case of Brownian diffusion process described below, these integrals are tractable and related to generalized inverse Gaussian distributions. [sent-176, score-0.109]

60 2 Greedy algorithms SMC algorithms are attractive because they can produce an arbitrarily accurate approximation to the full posterior as the number of samples grow. [sent-178, score-0.108]

61 However in many applications a single good tree is often sufﬁcient. [sent-179, score-0.107]

62 We describe a few greedy algorithms to construct a good tree. [sent-180, score-0.095]

63 Greedy-MaxProb: the obvious greedy algorithm is to pick δi , ρli and ρri maximizing the ith term in (7). [sent-181, score-0.103]

64 We do so by computing the optimal δi for each pair of ρli , ρri , and then picking the pair maximizing the ith term at its optimal δi . [sent-182, score-0.113]

65 Greedy-MinDuration: pick the pair to coalesce whose optimal duration is minimum. [sent-183, score-0.182]

66 Both algorithms require recomputing the optimal duration for each pair at each iteration, since the prior rate n−i+1 on the duration varies with the iteration i. [sent-184, score-0.223]

67 We can avoid this by using the alternative view of the n-coalesent as a Markov process where each pair of lineages coalesces at rate 1. [sent-186, score-0.199]

68 We coalesce the 2 pair with most recent time (as in Greedy-MinDuration). [sent-188, score-0.121]

69 The message at the newly coalesced point has parameters: vρi = (vρli + tli − ti )−1 + (vρri + tri − ti )−1 −1 ; yρi = yρli b vρli +tli −ti + yρri b vρri +tri −ti vρi (13) Multinomial vectors. [sent-198, score-0.535]

70 Consider a Markov process acting on multinomial vectors with each entry taking one of K values and evolving independently. [sent-199, score-0.142]

71 Entry d evolves at rate λd and has equilibrium distribution vector qd . [sent-200, score-0.099]

72 Representing the message for d d1 dK d entry d from ρi to its parent as a vector Mρi = [Mρi , . [sent-202, score-0.087]

73 In the Brownian case, we place an inverse Wishart prior on Λ and the MAP ˆ posterior Λ is available in a standard closed form. [sent-208, score-0.084]

74 In Figure 2 we compare the various SMC algorithms and Greedy-Rate1 on a range of synthetic data sets drawn from the Brownian diffusion coalescent process itself (Λ = ID ) to investigate the effects of various parameters on the efﬁcacy of the algorithms1 . [sent-212, score-0.719]

75 However, the posterior space also increases and SMCPriorPrior which simply samples from the prior over genealogies does not improve as much. [sent-217, score-0.167]

76 We compare the performance of Greedy-Rate1 to two other hierarchical clustering algorithms: average-linkage and Bayesian hierarchical clustering (BHC) [6]. [sent-223, score-0.35]

77 6 SMC−PostPost SMC−PriorPost SMC−PriorPrior Greedy−Rate1 10 30 50 S : p arti c l e s 70 Figure 2: Predictive performance of algorithms as we vary (a) the numbers of dimensions D, (b) observations n, (c) the mutation rate λ (Λ = λID ), and (d) number of samples S. [sent-248, score-0.135]

78 We present purity scores [6], subtree scores (#{interior nodes with all leaves of same class}/(n − #classes)) and leave-one-out accuracies (all scores between 0 and 1, higher better). [sent-290, score-0.122]

79 Experiments not presented here show that all greedy algorithms perform about the same and that performance improves with hyperparameter updates. [sent-292, score-0.132]

80 Based on the Indo-European subset of this data for which at most 30 features are unknown (48 languages total), we recover the coalescent tree shown in Figure 3(a). [sent-298, score-0.745]

81 Finally, we compare the trees generated by Greedy-Rate1 with trees generated by either averagelinkage or BHC, using the same evaluation criteria as for MNIST and SPAMBASE, using language genus as classes. [sent-324, score-0.251]

82 The results are in Table 5, where we can see that the coalescent signiﬁcantly outperforms the other methods. [sent-325, score-0.581]

83 The data was preprocessed so that only words occuring in at least 100 abstracts were retained. [sent-361, score-0.079]

84 In the supplemental material, we depict the top levels of the coalescent tree. [sent-364, score-0.581]

85 Here, we use the tree to generate a ﬂat clustering. [sent-365, score-0.107]

86 To do so, we use the log likelihood ratio at each branch in the coalescent to determine if a split should occur. [sent-366, score-0.581]

87 Note that the clustering is able to tease apart Bayesian learning (cluster 5) and non-bayesian learning (cluster 7)—both of which have Mike Jordan as their top author! [sent-371, score-0.092]

88 6 Discussion We described a new model for Bayesian agglomerative clustering. [sent-372, score-0.173]

89 We used Kingman’s coalescent as our prior over trees, and derived efﬁcient and easily implementable greedy and SMC inference algorithms for the model. [sent-373, score-0.743]

90 We showed empirically that our model gives better performance than other agglomerative clustering algorithms, and gives good results on applications to document modeling and phylolinguistics. [sent-374, score-0.265]

91 Our model is most similar in spirit to the Dirichlet diffusion tree of [2]. [sent-375, score-0.171]

92 While [2] uses a fragmentation process for trees, our prior uses the reverse—a coalescent process instead. [sent-377, score-0.705]

93 This allows us to develop simpler inference algorithms than those in [2] (we have not compared our model against the Dirichlet diffusion tree due to the complexity of implementing it). [sent-378, score-0.2]

94 It will be interesting to consider the possibility of developing similar agglomerative style algorithms for [2]. [sent-379, score-0.202]

95 [3] also describes a hierarchical clustering model involving a prior over trees, but his prior is not inﬁnitely exchangeable. [sent-380, score-0.243]

96 Another related work is the Bayesian hierarchical clustering of [6], which uses an agglomerative procedure returning a tree structured approximate posterior for a Dirichlet process mixture model. [sent-382, score-0.55]

97 Among the greedy algorithms we see that there are no discernible differences in quality of approximation thus we recommend GreedyRate1. [sent-386, score-0.095]

98 We know the answer for at least a simple case: if the Markov process is a mutation process with mutation rate α/2 and new states are drawn i. [sent-394, score-0.234]

99 Another issue is that of consistency—does the posterior over random distributions converge to the true distribution as the number of observations grows? [sent-398, score-0.084]

100 Finally, it would be interesting to generalize our approach to varying mutation rates, and to non-binary trees by using generalizations to Kingman’s coalescent called Λ-coalescents [15]. [sent-399, score-0.746]

similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('coalescent', 0.581), ('ri', 0.25), ('genealogy', 0.249), ('kingman', 0.216), ('li', 0.207), ('agglomerative', 0.173), ('smc', 0.158), ('germanic', 0.133), ('indic', 0.133), ('tli', 0.133), ('tri', 0.133), ('ti', 0.12), ('lineages', 0.116), ('slavic', 0.116), ('tree', 0.107), ('brownian', 0.101), ('bhc', 0.1), ('iranian', 0.1), ('trees', 0.093), ('romance', 0.092), ('clustering', 0.092), ('hierarchical', 0.083), ('celtic', 0.083), ('coalesce', 0.083), ('genealogies', 0.083), ('individuals', 0.076), ('mutation', 0.072), ('armenian', 0.066), ('spambase', 0.066), ('greedy', 0.066), ('diffusion', 0.064), ('duration', 0.061), ('languages', 0.057), ('exchangeable', 0.053), ('qd', 0.053), ('abstracts', 0.053), ('coalescents', 0.05), ('kti', 0.05), ('wals', 0.05), ('posterior', 0.05), ('markov', 0.047), ('subtree', 0.046), ('evolves', 0.046), ('wi', 0.046), ('process', 0.045), ('hierarchically', 0.044), ('purity', 0.043), ('mnist', 0.043), ('dirichlet', 0.043), ('genetics', 0.04), ('backwards', 0.038), ('pair', 0.038), ('bayesian', 0.037), ('hyperparameter', 0.037), ('ith', 0.037), ('language', 0.036), ('sejnowski', 0.035), ('koch', 0.035), ('carlo', 0.035), ('monte', 0.035), ('observations', 0.034), ('prior', 0.034), ('leaves', 0.033), ('albanian', 0.033), ('baltic', 0.033), ('bower', 0.033), ('implementable', 0.033), ('ious', 0.033), ('kst', 0.033), ('llr', 0.033), ('phylolinguistics', 0.033), ('typological', 0.033), ('evolving', 0.033), ('entry', 0.033), ('multinomial', 0.031), ('particles', 0.031), ('sequential', 0.03), ('latent', 0.029), ('merge', 0.029), ('daum', 0.029), ('genus', 0.029), ('greek', 0.029), ('message', 0.029), ('algorithms', 0.029), ('cluster', 0.028), ('predictive', 0.027), ('nitely', 0.027), ('tn', 0.027), ('dk', 0.027), ('alive', 0.026), ('occuring', 0.026), ('yb', 0.026), ('exp', 0.026), ('draw', 0.026), ('parent', 0.025), ('atlas', 0.025), ('subtrees', 0.025), ('countably', 0.025), ('voltage', 0.025), ('ik', 0.024)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.9999997 31 nips-2007-Bayesian Agglomerative Clustering with Coalescents

Author: Yee W. Teh, Daniel J. Hsu, Hal Daume

2 0.091383658 75 nips-2007-Efficient Bayesian Inference for Dynamically Changing Graphs

Author: Ozgur Sumer, Umut Acar, Alexander T. Ihler, Ramgopal R. Mettu

Abstract: Motivated by stochastic systems in which observed evidence and conditional dependencies between states of the network change over time, and certain quantities of interest (marginal distributions, likelihood estimates etc.) must be updated, we study the problem of adaptive inference in tree-structured Bayesian networks. We describe an algorithm for adaptive inference that handles a broad range of changes to the network and is able to maintain marginal distributions, MAP estimates, and data likelihoods in all expected logarithmic time. We give an implementation of our algorithm and provide experiments that show that the algorithm can yield up to two orders of magnitude speedups on answering queries and responding to dynamic changes over the sum-product algorithm. 1

3 0.086580597 68 nips-2007-Discovering Weakly-Interacting Factors in a Complex Stochastic Process

Author: Charlie Frogner, Avi Pfeffer

Abstract: Dynamic Bayesian networks are structured representations of stochastic processes. Despite their structure, exact inference in DBNs is generally intractable. One approach to approximate inference involves grouping the variables in the process into smaller factors and keeping independent beliefs over these factors. In this paper we present several techniques for decomposing a dynamic Bayesian network automatically to enable factored inference. We examine a number of features of a DBN that capture different types of dependencies that will cause error in factored inference. An empirical comparison shows that the most useful of these is a heuristic that estimates the mutual information introduced between factors by one step of belief propagation. In addition to features computed over entire factors, for efﬁciency we explored scores computed over pairs of variables. We present search methods that use these features, pairwise and not, to ﬁnd a factorization, and we compare their results on several datasets. Automatic factorization extends the applicability of factored inference to large, complex models that are undesirable to factor by hand. Moreover, tests on real DBNs show that automatic factorization can achieve signiﬁcantly lower error in some cases. 1

4 0.082088441 197 nips-2007-The Infinite Markov Model

Author: Daichi Mochihashi, Eiichiro Sumita

Abstract: We present a nonparametric Bayesian method of estimating variable order Markov processes up to a theoretically inﬁnite order. By extending a stick-breaking prior, which is usually deﬁned on a unit interval, “vertically” to the trees of inﬁnite depth associated with a hierarchical Chinese restaurant process, our model directly infers the hidden orders of Markov dependencies from which each symbol originated. Experiments on character and word sequences in natural language showed that the model has a comparative performance with an exponentially large full-order model, while computationally much efﬁcient in both time and space. We expect that this basic model will also extend to the variable order hierarchical clustering of general data. 1

5 0.08089038 116 nips-2007-Learning the structure of manifolds using random projections

Author: Yoav Freund, Sanjoy Dasgupta, Mayank Kabra, Nakul Verma

Abstract: We present a simple variant of the k-d tree which automatically adapts to intrinsic low dimensional structure in data. 1

6 0.059547316 186 nips-2007-Statistical Analysis of Semi-Supervised Regression

7 0.059525806 9 nips-2007-A Probabilistic Approach to Language Change

8 0.059178378 214 nips-2007-Variational inference for Markov jump processes

9 0.058024324 65 nips-2007-DIFFRAC: a discriminative and flexible framework for clustering

10 0.057148471 167 nips-2007-Regulator Discovery from Gene Expression Time Series of Malaria Parasites: a Hierachical Approach

11 0.05684863 61 nips-2007-Convex Clustering with Exemplar-Based Models

12 0.056734271 34 nips-2007-Bayesian Policy Learning with Trans-Dimensional MCMC

13 0.055571325 103 nips-2007-Inferring Elapsed Time from Stochastic Neural Processes

14 0.053805221 213 nips-2007-Variational Inference for Diffusion Processes

15 0.05323948 80 nips-2007-Ensemble Clustering using Semidefinite Programming

16 0.052431442 20 nips-2007-Adaptive Embedded Subgraph Algorithms using Walk-Sum Analysis

17 0.050977919 105 nips-2007-Infinite State Bayes-Nets for Structured Domains

18 0.050587408 46 nips-2007-Cluster Stability for Finite Samples

19 0.050107528 97 nips-2007-Hidden Common Cause Relations in Relational Learning

20 0.050086387 203 nips-2007-The rat as particle filter

similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, -0.167), (1, 0.011), (2, -0.03), (3, -0.063), (4, -0.039), (5, -0.074), (6, -0.021), (7, 0.003), (8, -0.02), (9, -0.041), (10, -0.077), (11, 0.076), (12, 0.064), (13, -0.111), (14, -0.018), (15, 0.06), (16, -0.016), (17, 0.003), (18, 0.052), (19, -0.03), (20, 0.096), (21, 0.047), (22, -0.031), (23, 0.039), (24, 0.135), (25, -0.007), (26, -0.063), (27, 0.047), (28, -0.082), (29, -0.018), (30, 0.001), (31, 0.077), (32, 0.024), (33, -0.085), (34, 0.059), (35, -0.006), (36, -0.005), (37, 0.035), (38, -0.008), (39, -0.084), (40, -0.043), (41, 0.058), (42, -0.058), (43, 0.11), (44, 0.004), (45, 0.102), (46, 0.045), (47, 0.02), (48, 0.091), (49, -0.159)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.92387223 31 nips-2007-Bayesian Agglomerative Clustering with Coalescents

Author: Yee W. Teh, Daniel J. Hsu, Hal Daume

2 0.62821668 167 nips-2007-Regulator Discovery from Gene Expression Time Series of Malaria Parasites: a Hierachical Approach

Author: José M. Hernández-lobato, Tjeerd Dijkstra, Tom Heskes

Abstract: We introduce a hierarchical Bayesian model for the discovery of putative regulators from gene expression data only. The hierarchy incorporates the knowledge that there are just a few regulators that by themselves only regulate a handful of genes. This is implemented through a so-called spike-and-slab prior, a mixture of Gaussians with different widths, with mixing weights from a hierarchical Bernoulli model. For efﬁcient inference we implemented expectation propagation. Running the model on a malaria parasite data set, we found four genes with signiﬁcant homology to transcription factors in an amoebe, one RNA regulator and three genes of unknown function (out of the top ten genes considered).

3 0.59855944 78 nips-2007-Efficient Principled Learning of Thin Junction Trees

Author: Anton Chechetka, Carlos Guestrin

Abstract: We present the ﬁrst truly polynomial algorithm for PAC-learning the structure of bounded-treewidth junction trees – an attractive subclass of probabilistic graphical models that permits both the compact representation of probability distributions and efﬁcient exact inference. For a constant treewidth, our algorithm has polynomial time and sample complexity. If a junction tree with sufﬁciently strong intraclique dependencies exists, we provide strong theoretical guarantees in terms of KL divergence of the result from the true distribution. We also present a lazy extension of our approach that leads to very signiﬁcant speed ups in practice, and demonstrate the viability of our method empirically, on several real world datasets. One of our key new theoretical insights is a method for bounding the conditional mutual information of arbitrarily large sets of variables with only polynomially many mutual information computations on ﬁxed-size subsets of variables, if the underlying distribution can be approximated by a bounded-treewidth junction tree. 1

4 0.58812594 197 nips-2007-The Infinite Markov Model

Author: Daichi Mochihashi, Eiichiro Sumita

5 0.51274037 214 nips-2007-Variational inference for Markov jump processes

Author: Manfred Opper, Guido Sanguinetti

Abstract: Markov jump processes play an important role in a large number of application domains. However, realistic systems are analytically intractable and they have traditionally been analysed using simulation based techniques, which do not provide a framework for statistical inference. We propose a mean ﬁeld approximation to perform posterior inference and parameter estimation. The approximation allows a practical solution to the inference problem, while still retaining a good degree of accuracy. We illustrate our approach on two biologically motivated systems.

6 0.49646145 89 nips-2007-Feature Selection Methods for Improving Protein Structure Prediction with Rosetta

7 0.49363035 116 nips-2007-Learning the structure of manifolds using random projections

8 0.4798581 75 nips-2007-Efficient Bayesian Inference for Dynamically Changing Graphs

9 0.4635922 16 nips-2007-A learning framework for nearest neighbor search

10 0.45359391 105 nips-2007-Infinite State Bayes-Nets for Structured Domains

11 0.43751949 196 nips-2007-The Infinite Gamma-Poisson Feature Model

12 0.41962942 44 nips-2007-Catching Up Faster in Bayesian Model Selection and Model Averaging

13 0.39277256 171 nips-2007-Scan Strategies for Meteorological Radars

14 0.38773653 27 nips-2007-Anytime Induction of Cost-sensitive Trees

15 0.38649434 9 nips-2007-A Probabilistic Approach to Language Change

16 0.38160673 68 nips-2007-Discovering Weakly-Interacting Factors in a Complex Stochastic Process

17 0.37760711 72 nips-2007-Discriminative Log-Linear Grammars with Latent Variables

18 0.37725401 139 nips-2007-Nearest-Neighbor-Based Active Learning for Rare Category Detection

19 0.36888289 193 nips-2007-The Distribution Family of Similarity Distances

20 0.35965368 17 nips-2007-A neural network implementing optimal state estimation based on dynamic spike train decoding

similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(5, 0.043), (8, 0.313), (13, 0.039), (16, 0.029), (18, 0.035), (21, 0.076), (31, 0.022), (34, 0.027), (35, 0.027), (47, 0.077), (49, 0.011), (83, 0.116), (85, 0.032), (87, 0.025), (90, 0.047)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.73379707 31 nips-2007-Bayesian Agglomerative Clustering with Coalescents

Author: Yee W. Teh, Daniel J. Hsu, Hal Daume

2 0.65372705 86 nips-2007-Exponential Family Predictive Representations of State

Author: David Wingate, Satinder S. Baveja

Abstract: In order to represent state in controlled, partially observable, stochastic dynamical systems, some sort of sufﬁcient statistic for history is necessary. Predictive representations of state (PSRs) capture state as statistics of the future. We introduce a new model of such systems called the “Exponential family PSR,” which deﬁnes as state the time-varying parameters of an exponential family distribution which models n sequential observations in the future. This choice of state representation explicitly connects PSRs to state-of-the-art probabilistic modeling, which allows us to take advantage of current efforts in high-dimensional density estimation, and in particular, graphical models and maximum entropy models. We present a parameter learning algorithm based on maximum likelihood, and we show how a variety of current approximate inference methods apply. We evaluate the quality of our model with reinforcement learning by directly evaluating the control performance of the model. 1

3 0.49669942 94 nips-2007-Gaussian Process Models for Link Analysis and Transfer Learning

Author: Kai Yu, Wei Chu

Abstract: This paper aims to model relational data on edges of networks. We describe appropriate Gaussian Processes (GPs) for directed, undirected, and bipartite networks. The inter-dependencies of edges can be effectively modeled by adapting the GP hyper-parameters. The framework suggests an intimate connection between link prediction and transfer learning, which were traditionally two separate research topics. We develop an efﬁcient learning algorithm that can handle a large number of observations. The experimental results on several real-world data sets verify superior learning capacity. 1

4 0.49565908 34 nips-2007-Bayesian Policy Learning with Trans-Dimensional MCMC

Author: Matthew Hoffman, Arnaud Doucet, Nando D. Freitas, Ajay Jasra

Abstract: A recently proposed formulation of the stochastic planning and control problem as one of parameter estimation for suitable artiﬁcial statistical models has led to the adoption of inference algorithms for this notoriously hard problem. At the algorithmic level, the focus has been on developing Expectation-Maximization (EM) algorithms. In this paper, we begin by making the crucial observation that the stochastic control problem can be reinterpreted as one of trans-dimensional inference. With this new interpretation, we are able to propose a novel reversible jump Markov chain Monte Carlo (MCMC) algorithm that is more efﬁcient than its EM counterparts. Moreover, it enables us to implement full Bayesian policy search, without the need for gradients and with one single Markov chain. The new approach involves sampling directly from a distribution that is proportional to the reward and, consequently, performs better than classic simulations methods in situations where the reward is a rare event.

5 0.49499866 93 nips-2007-GRIFT: A graphical model for inferring visual classification features from human data

Author: Michael Ross, Andrew Cohen

Abstract: This paper describes a new model for human visual classiﬁcation that enables the recovery of image features that explain human subjects’ performance on different visual classiﬁcation tasks. Unlike previous methods, this algorithm does not model their performance with a single linear classiﬁer operating on raw image pixels. Instead, it represents classiﬁcation as the combination of multiple feature detectors. This approach extracts more information about human visual classiﬁcation than previous methods and provides a foundation for further exploration. 1

6 0.49285576 138 nips-2007-Near-Maximum Entropy Models for Binary Neural Representations of Natural Images

7 0.49256808 63 nips-2007-Convex Relaxations of Latent Variable Training

8 0.49254233 18 nips-2007-A probabilistic model for generating realistic lip movements from speech

9 0.49091485 153 nips-2007-People Tracking with the Laplacian Eigenmaps Latent Variable Model

10 0.48809826 209 nips-2007-Ultrafast Monte Carlo for Statistical Summations

11 0.48724476 73 nips-2007-Distributed Inference for Latent Dirichlet Allocation

12 0.48703143 156 nips-2007-Predictive Matrix-Variate t Models

13 0.48610154 69 nips-2007-Discriminative Batch Mode Active Learning

14 0.48534766 172 nips-2007-Scene Segmentation with CRFs Learned from Partially Labeled Images

15 0.48479092 79 nips-2007-Efficient multiple hyperparameter learning for log-linear models

16 0.48396823 45 nips-2007-Classification via Minimum Incremental Coding Length (MICL)

17 0.48394796 97 nips-2007-Hidden Common Cause Relations in Relational Learning

18 0.48315862 115 nips-2007-Learning the 2-D Topology of Images

19 0.48246172 125 nips-2007-Markov Chain Monte Carlo with People

20 0.48174027 88 nips-2007-Fast and Scalable Training of Semi-Supervised CRFs with Application to Activity Recognition