nips nips2008 nips2008-18 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Dilan Gorur, Yee W. Teh
Abstract: We propose an efficient sequential Monte Carlo inference scheme for the recently proposed coalescent clustering model [1]. Our algorithm has a quadratic runtime while those in [1] is cubic. In experiments, we were surprised to find that in addition to being more efficient, it is also a better sequential Monte Carlo sampler than the best in [1], when measured in terms of variance of estimated likelihood and effective sample size. 1
Reference: text
sentIndex sentText sentNum sentScore
1 uk Abstract We propose an efficient sequential Monte Carlo inference scheme for the recently proposed coalescent clustering model [1]. [sent-7, score-0.48]
2 In experiments, we were surprised to find that in addition to being more efficient, it is also a better sequential Monte Carlo sampler than the best in [1], when measured in terms of variance of estimated likelihood and effective sample size. [sent-9, score-0.118]
3 1 Introduction Algorithms for automatically discovering hierarchical structure from data play an important role in machine learning. [sent-10, score-0.037]
4 In many cases the data itself has an underlying hierarchical structure whose discovery is of interest, examples include phylogenies in biology, object taxonomies in vision or cognition, and parse trees in linguistics. [sent-11, score-0.037]
5 In other cases, even when the data is not hierarchically structured, such structures are still useful simply as a statistical tool to efficiently pool information across the data at different scales; this is the starting point of hierarchical modelling in statistics. [sent-12, score-0.037]
6 Many hierarchical clustering algorithms have been proposed in the past for discovering hierarchies. [sent-13, score-0.081]
7 In this paper we are interested in a Bayesian approach to hierarchical clustering [2, 3, 1]. [sent-14, score-0.081]
8 Unfortunately, inference in Bayesian models of hierarchical clustering are often very complex to implement, and computationally expensive as well. [sent-16, score-0.081]
9 In this paper we build upon the work of [1] who proposed a Bayesian hierarchical clustering model based on Kingman’s coalescent [4, 5]. [sent-17, score-0.477]
10 [1] proposed both greedy and sequential Monte Carlo (SMC) based agglomerative clustering algorithms for inferring hierarchical clustering which are simpler to implement than Markov chain Monte Carlo methods. [sent-18, score-0.165]
11 We propose a new SMC based algorithm for inference in the coalescent clustering of [1]. [sent-21, score-0.44]
12 The algorithm is based upon a different perspective on Kingman’s coalescent than that in [1], where the computations required to consider whether to merge each pair of clusters at each iteration is not discarded in subsequent iterations. [sent-22, score-0.434]
13 Kingman’s coalescent originated in the population genetics literature, and there has been significant interest there on inference, including Markov chain Monte Carlo based approaches [6] and SMC approaches [7, 8]. [sent-25, score-0.396]
14 While ours and [1] integrate out the mutations on the coalescent tree and sample the coalescent 1 times, [7, 8] integrate out the coalescent times, and sample mutations instead. [sent-27, score-1.347]
15 Because of this difference, ours and that of [1] will be more efficient in higher dimensional data, as well as other cases where the state space is too large and sampling mutations will be inefficient. [sent-28, score-0.038]
16 In the next section, we review Kingman’s coalescent and the existing SMC algorithms for inference on this model. [sent-29, score-0.396]
17 2 Hierarchical Clustering using Kingman’s Coalescent Kingman’s coalescent [4, 5] describes the family relationship between a set of haploid individuals by constructing the genealogy backwards in time. [sent-32, score-0.511]
18 Ancestral lines coalesce when the individuals share a common ancestor, and the genealogy is a binary tree rooted at the common ancestor of all the individuals under consideration. [sent-33, score-0.294]
19 We briefly review the coalescent and the associated clustering model as presented in [1] before presenting a different formulation more suitable for our proposed algorithm. [sent-34, score-0.44]
20 There are n−1 coalescent events in π, we order these events with i = 1 being the most recent one, and i = n − 1 for the last event when all ancestral lines are coalesced. [sent-36, score-0.396]
21 Let Ai be the set of ancestors right after coalescent event i, and A0 be the full set of individuals at the present time T0 = 0. [sent-38, score-0.473]
22 To draw a sample π from Kingman’s coalescent we sample the coalescent events one at a time starting from the present. [sent-39, score-0.836]
23 (1) The coalescent can be used as a prior over binary trees in a model where we have a tree-structured likelihood function for observations at the leaves. [sent-42, score-0.427]
24 For this purpose, we will give a different perspective to constructing the coalescent in the following and describe our sequential Monte Carlo algorithm in Section 3. [sent-50, score-0.436]
25 1 A regenerative race process In this section we describe a different formulation of the coalescent based on the fact that each stage of the coalescent can be interpreted as a race between the n−i+1 pairs of individuals to coalesce. [sent-52, score-1.496]
26 2 Each pair proposes a coalescent time, the pair with most recent coalescent time “wins” the race and gets to coalesce, at which point the next stage starts with n−i pairs in the race. [sent-53, score-1.209]
27 Na¨vely this race ı 2 process would require a total of O(n3 ) pairs to propose coalescent times. [sent-54, score-0.66]
28 We show that using the regenerative (memoryless) property of exponential distributions allows us to reduce this to O(n2 ). [sent-55, score-0.051]
29 2 Algorithm 1 A regenerative race process for constructing the coalescent inputs: number of individuals n, set starting time T0 = 0 and A0 the set of n individuals for all pairs of existing individuals ρl , ρr ∈ A0 do propose coalescent time tlr using eq. [sent-56, score-1.725]
30 (4) end for for all coalescence events i = 1 : n − 1 do find the pair to coalesce (ρli , ρri ) using eq. [sent-57, score-0.161]
31 (5) set coalescent time Ti = tli ri and update Ai = Ai−1 − {ρli , ρri } + {ρi } remove pairs with ρl ∈ {ρli , ρri }, ρr ∈ Ai−1 \{ρli , ρri } for all new pairs with ρl = ρi , ρr ∈ Ai \{ρi } do propose coalescent time using eq. [sent-58, score-1.169]
32 At stage i of the coalescent we have n − i + 1 individuals in Ai−1 , and coalesce. [sent-60, score-0.55]
33 Each pair ρl , ρr ∈ Ai−1 , ρl = ρr proposes a coalescent time n−i+1 2 pairs in the race to tlr |Ti−1 ∼ Ti−1 − Exp(1), (4) that is, by subtracting from the last coalescent time a waiting time drawn from an exponential distribution of rate 1. [sent-61, score-1.481]
34 The pair ρli , ρri with most recent coalescent time wins the race: ρl , ρr ∈ Ai−1 , ρl = ρr (ρli , ρri ) = argmax tlr , (5) (ρl ,ρr ) and coalesces into a new individual ρi at time Ti = tli ri . [sent-62, score-1.201]
35 At this point stage i + 1 of the race begins, with some pairs dropping out of the race (specifically those with one half of the pair being either ρli or ρri ) and new ones entering (specifically those formed by pairing the new individual ρi with an existing one). [sent-63, score-0.614]
36 Among the pairs (ρl , ρr ) that did not drop out nor just entered the race, consider the distribution of tlr conditioned on the fact that tlr < Ti (since (ρl , ρr ) did not win the race at stage i). [sent-64, score-1.115]
37 Using the memoryless property of the exponential distribution, we see that tlr |Ti ∼ Ti − Exp(1), thus eq. [sent-65, score-0.387]
38 (4) still holds and we need not redraw tlr for the stage i + 1 race. [sent-66, score-0.464]
39 In other words, once tlr is drawn once, it can be reused for subsequent stages of the race until it either wins a race or drops out. [sent-67, score-0.895]
40 We obtain the probability of the coalescent π as a product over the i = 1, . [sent-69, score-0.396]
41 , n − 1 stages of the race, of the probability of each event “ρli , ρri wins stage i and coalesces at time Ti ” given more recent stages. [sent-72, score-0.138]
42 The probability at stage i is simply the probability that tli ri = Ti , and that all other proposed coalescent times tlr < Ti , conditioned on the fact that the proposed coalescent times tlr for all pairs at stage i are all less than Ti−1 . [sent-73, score-2.068]
43 Each pair that participated in the race has corresponding terms in eq. [sent-75, score-0.273]
44 (7), starting at the stage when the pair entered the race, and ending with the stage when the pair either dropped out or wins the stage. [sent-76, score-0.268]
45 (7) simplifies to, n−1 p(tli ri = Ti ) p(π) = i=1 p(tlr < Ti ) , ρl ∈{ρli ,ρri },ρr ∈Ai−1 \{ρli ,ρri } 3 (8) where the second product runs only over those pairs that dropped out after stage i. [sent-78, score-0.302]
46 The first term is the probability of pair (ρli , ρri ) coalescing at time Ti given its entrance time, and the second term is the probability of pair (ρl , ρr ) dropping out of the race at time Ti given its entrance time. [sent-79, score-0.388]
47 (2) we have, n−1 Zρi (xi |θi )p(tli ri = Ti ) p(x, π) = Z0 (x) i=1 3 p(tlr < Ti ) . [sent-84, score-0.196]
48 (9) ρl ∈{ρli ,ρri }, ρr ∈Ai−1 \{ρli ,ρri } Efficient SMC Inference on the Coalescent Our sequential Monte Carlo algorithm for posterior inference is directly inspired by the regenerative race process described above. [sent-85, score-0.326]
49 In fact the algorithm is structurally exactly as in Algorithm 1, but with each pair ρl , ρr proposing a coalescent time from a proposal distribution tlr ∼ Qlr instead of from eq. [sent-86, score-0.882]
50 The idea is that the proposal distribution Qlr is constructed taking into account the observed data, so that Algorithm 1 produces better approximate samples from the posterior. [sent-88, score-0.061]
51 (6)-(8), and is, n−1 qlr (tlr < Ti ) , qli ri (tli ri = Ti ) q(π) = i=1 (10) ρl ∈{ρli ,ρri },ρr ∈Ai−1 \{ρli ,ρri } where qlr is the density of Qlr . [sent-90, score-0.743]
52 (10) can be computed sequentially, the weight w associated with each sample π can be computed “on the fly” as the coalescent tree is constructed: w0 = Z0 (x) Zρ (xi |θi )p(tli ri = Ti ) wi = wi−1 i qli ri (tli ri = Ti ) ρl ∈{ρli ,ρri }, ρr ∈Ai−1 \{ρli ,ρri } p(tlr < Ti ) . [sent-93, score-1.068]
53 qlr (tlr < Ti ) (11) Finally we address the choice of proposal distribution Qlr to use. [sent-94, score-0.225]
54 The proposal distribution in [1] also has a form similar to eq. [sent-102, score-0.061]
55 (12), but with the exponential rate being n−i+1 instead, if the proposal was in stage i of the race. [sent-103, score-0.138]
56 This dependence means that at 2 each stage of the race the coalescent times proposal distribution needs to be recomputed for each pair, leading to an O(n3 ) computation time. [sent-104, score-0.769]
57 On the other hand, similar to the prior process, we need to propose a coalescent time for each pair only once when it is first created. [sent-105, score-0.434]
58 The additional log n factor comes about because a priority queue needs to be maintained to determine the winner of each stage efficiently, but this is negligible compared to m. [sent-115, score-0.077]
59 1 Independent-Sites Parent-Independent Likelihood Model In our experiments we have only considered coalescent clustering of discrete data, though our approach can be applied more generally. [sent-117, score-0.44]
60 We use the independent-sites parent-independent mutation model over multinomial vectors in [1] as our likelihood model. [sent-119, score-0.066]
61 Specifically, this model assumes that each point on the tree is associated with a D dimensional multinomial vector, and each entry of this vector on each branch of the tree evolves independently (thus independent-sites), forward in time, and with mutations occurring at rate λd on entry d. [sent-120, score-0.172]
62 When a mutation occurs, a new value for the entry is drawn from a distribution φd , independently of the previous value at that entry (thus parentindependent). [sent-121, score-0.091]
63 When a coalescent event is encountered, the mutation process evolves independently down both branches. [sent-122, score-0.431]
64 The message for entry d from node ρi on the tree to its parent is a vector d d1 dK d Mρi = [Mρi , . [sent-124, score-0.067]
65 Specifically, we use ˜ a piecewise linear log qlr (tlr ), which can be easily sampled from, and for which the normalization ˜ term is easy to compute. [sent-129, score-0.164]
66 Note that log Zρlr (xlr |tlr , ρl , ρr , θi−1 ), as a function of tlr , is concave if the term inside the parentheses in eq. [sent-131, score-0.408]
67 Using the upper and lower envelopes developed for adaptive rejection sampling [9], we can construct piecewise linear upper and lower envelopes for log qlr (tlr ) by upper and lower bounding the concave and convex parts separately. [sent-135, score-0.304]
68 The upper and lower envelopes give exact bounds on the approximation error introduced, and we can efficiently improve the envelopes until a given desired approximation error is achieved. [sent-136, score-0.094]
69 Finally, we used the upper bound as our approximate log qlr (tlr ). [sent-137, score-0.164]
70 Note that the same ˜ issue arises in the proposal distribution for SMC-PostPost, and we used the same piecewise linear approximation. [sent-138, score-0.061]
71 4 Experiments The improved computational cost of inference makes it possible to do Bayesian inference for the coalescence models on larger datasets. [sent-140, score-0.082]
72 An important question is how many particles we need in practice. [sent-143, score-0.117]
73 There is overlap between the features of the data points however the data does not obey a tree structure, which will result in a multimodal posterior. [sent-146, score-0.039]
74 without using resampling for comparison of the proposal distributions. [sent-153, score-0.121]
75 A sample tree from the SMC1 algorithm demonstrate that the algorithm could capture the similarity structure. [sent-157, score-0.061]
76 The true covariance of the data (a) and the distance on the tree learned by the SMC1 algorithm averaged over particles (b) are shown, showing that the overall structure was corerctly captured. [sent-158, score-0.156]
77 From this figure, we can conclude that the computationally cheaper algorithm SMC1 is more efficient also in the number of particles as it gives more accurate answers with less particles. [sent-162, score-0.117]
78 The variance of both algorithms shrink and the effective sample size increases as the number of particles increase. [sent-165, score-0.164]
79 A quantity of interest in genealogical studies is the time to the most recent common ancestor (MRCA), which is the time of the last coalescence event. [sent-166, score-0.104]
80 Similar to the variance behaviour in the likelihood, with small number of particles SMC-PostPost has higher variance than SMC1 . [sent-169, score-0.117]
81 The mean time for each step of coalescence together with its variance for 7250 particles for both algorithms is depicted in Figure 3. [sent-171, score-0.199]
82 It is interesting that the first few coalescence times of SMC1 are shorter than those for SMC-PostPost. [sent-172, score-0.082]
83 If there is only a few particles that come from a high probability region, the weights of those particles would be much larger than 6 0 10 −1 10 −2 10 smc1 smcPostPost −3 10 2 4 6 8 10 12 14 Figure 3: Times for each coalescence step averaged over 7250 particles. [sent-175, score-0.316]
84 There is a slight difference in the mean coalescence time. [sent-177, score-0.082]
85 It is interesting that the SMC1 algorithm proposes shorter times for the initial coalescence events. [sent-178, score-0.082]
86 the rest, resulting in a low effective sample size. [sent-179, score-0.047]
87 Here, we note that for the synthetic dataset, the effective sample size of SMC-PostPost is very poor, and that of SMC1 is much higher, see Figure 2. [sent-181, score-0.047]
88 5 Discussion We described an efficient Sequential Monte Carlo algorithm for inference on hierarchical clustering models that use Kingman’s coalescent as a proir. [sent-182, score-0.477]
89 Our method makes use of a regenerative perspective to construct the coalescent tree. [sent-183, score-0.447]
90 Unfortunately, on this dataset, the effective sample size of both algorithms is close to one. [sent-189, score-0.047]
91 A usual method to circumvent the low effective sample size problem in sequential Monte Carlo algorithms is to do resampling, that is, detecting the particles that will not contribute much to the posterior from the partial samples and prune them away, multiplying the promising samples. [sent-190, score-0.226]
92 We need to decide at what point to prune away samples, and how to select which samples to prune away. [sent-192, score-0.044]
93 As shown by [11], different problems may require different resampling algorithms. [sent-193, score-0.06]
94 Furthermore, in the recursive calculation of the weights in SMC1 , we are including the effect of a pair only when they either coalesce or cease to exist for the sake of saving computations. [sent-197, score-0.079]
95 Therefore the partial weights are even less informative about the state of the sample and the effective sample size cannot really give full explanation about whether the current sample is good or not. [sent-198, score-0.091]
96 In fact, we did observe oscillations on the effective sample size calculated on the weights along the iterations, i. [sent-199, score-0.047]
97 starting off with a high value, decreasing to virtually 1 and increasing later before the termination, which also indicates that it is not clear which of the particles will be more effective eventually. [sent-201, score-0.142]
98 An open question is how to incorporate a resampling algorithm to improve the efficiency. [sent-202, score-0.06]
99 (a),(b) Samples from a run with 7750 particles without resampling. [sent-215, score-0.117]
100 Note that the weight of (a) is almost one, which means that the contribution from the rest of the particles is infinitesimal although the tree structure in (b) also seem to capture the similarities between languages. [sent-218, score-0.156]
wordName wordTfidf (topN-words)
[('coalescent', 0.396), ('tlr', 0.387), ('indic', 0.282), ('germanic', 0.247), ('slavic', 0.247), ('race', 0.235), ('ri', 0.196), ('romance', 0.185), ('ti', 0.179), ('celtic', 0.176), ('iranian', 0.176), ('qlr', 0.164), ('armenian', 0.123), ('smc', 0.123), ('tli', 0.123), ('particles', 0.117), ('lr', 0.111), ('li', 0.096), ('coalescence', 0.082), ('individuals', 0.077), ('stage', 0.077), ('albanian', 0.07), ('baltic', 0.07), ('kingman', 0.07), ('tc', 0.062), ('proposal', 0.061), ('resampling', 0.06), ('xlr', 0.059), ('ai', 0.058), ('greek', 0.056), ('carlo', 0.052), ('regenerative', 0.051), ('monte', 0.05), ('envelopes', 0.047), ('clustering', 0.044), ('coalesce', 0.041), ('sequential', 0.04), ('tree', 0.039), ('pair', 0.038), ('wins', 0.038), ('genealogy', 0.038), ('mutations', 0.038), ('hierarchical', 0.037), ('bengali', 0.035), ('breton', 0.035), ('bulgarian', 0.035), ('catalan', 0.035), ('cornish', 0.035), ('gaelic', 0.035), ('icelandic', 0.035), ('irish', 0.035), ('kashmiri', 0.035), ('kurdish', 0.035), ('latvian', 0.035), ('lithuanian', 0.035), ('maithili', 0.035), ('marathi', 0.035), ('nepali', 0.035), ('norwegian', 0.035), ('ossetic', 0.035), ('panjabi', 0.035), ('pashto', 0.035), ('persian', 0.035), ('polish', 0.035), ('romanian', 0.035), ('russian', 0.035), ('serbian', 0.035), ('sinhala', 0.035), ('slovene', 0.035), ('tajik', 0.035), ('ukrainian', 0.035), ('mutation', 0.035), ('particle', 0.033), ('likelihood', 0.031), ('coalescing', 0.031), ('danish', 0.031), ('hindi', 0.031), ('italian', 0.031), ('portuguese', 0.031), ('swedish', 0.031), ('welsh', 0.031), ('pairs', 0.029), ('dutch', 0.028), ('entry', 0.028), ('dk', 0.027), ('spanish', 0.026), ('effective', 0.025), ('tl', 0.025), ('french', 0.025), ('rejection', 0.025), ('czech', 0.024), ('coalesces', 0.023), ('entrance', 0.023), ('qli', 0.023), ('sample', 0.022), ('ancestor', 0.022), ('prune', 0.022), ('german', 0.022), ('concave', 0.021), ('mrca', 0.021)]
simIndex simValue paperId paperTitle
same-paper 1 1.0000005 18 nips-2008-An Efficient Sequential Monte Carlo Algorithm for Coalescent Clustering
Author: Dilan Gorur, Yee W. Teh
Abstract: We propose an efficient sequential Monte Carlo inference scheme for the recently proposed coalescent clustering model [1]. Our algorithm has a quadratic runtime while those in [1] is cubic. In experiments, we were surprised to find that in addition to being more efficient, it is also a better sequential Monte Carlo sampler than the best in [1], when measured in terms of variance of estimated likelihood and effective sample size. 1
2 0.20534314 235 nips-2008-The Infinite Hierarchical Factor Regression Model
Author: Piyush Rai, Hal Daume
Abstract: We propose a nonparametric Bayesian factor regression model that accounts for uncertainty in the number of factors, and the relationship between factors. To accomplish this, we propose a sparse variant of the Indian Buffet Process and couple this with a hierarchical model over factors, based on Kingman’s coalescent. We apply this model to two problems (factor analysis and factor regression) in gene-expression data analysis. 1
3 0.10376346 11 nips-2008-A spatially varying two-sample recombinant coalescent, with applications to HIV escape response
Author: Alexander Braunstein, Zhi Wei, Shane T. Jensen, Jon D. Mcauliffe
Abstract: Statistical evolutionary models provide an important mechanism for describing and understanding the escape response of a viral population under a particular therapy. We present a new hierarchical model that incorporates spatially varying mutation and recombination rates at the nucleotide level. It also maintains separate parameters for treatment and control groups, which allows us to estimate treatment effects explicitly. We use the model to investigate the sequence evolution of HIV populations exposed to a recently developed antisense gene therapy, as well as a more conventional drug therapy. The detection of biologically relevant and plausible signals in both therapy studies demonstrates the effectiveness of the method. 1
4 0.10238631 116 nips-2008-Learning Hybrid Models for Image Annotation with Partially Labeled Data
Author: Xuming He, Richard S. Zemel
Abstract: Extensive labeled data for image annotation systems, which learn to assign class labels to image regions, is difficult to obtain. We explore a hybrid model framework for utilizing partially labeled data that integrates a generative topic model for image appearance with discriminative label prediction. We propose three alternative formulations for imposing a spatial smoothness prior on the image labels. Tests of the new models and some baseline approaches on three real image datasets demonstrate the effectiveness of incorporating the latent structure. 1
5 0.069481581 97 nips-2008-Hierarchical Fisher Kernels for Longitudinal Data
Author: Zhengdong Lu, Jeffrey Kaye, Todd K. Leen
Abstract: We develop new techniques for time series classification based on hierarchical Bayesian generative models (called mixed-effect models) and the Fisher kernel derived from them. A key advantage of the new formulation is that one can compute the Fisher information matrix despite varying sequence lengths and varying sampling intervals. This avoids the commonly-used ad hoc replacement of the Fisher information matrix with the identity which destroys the geometric invariance of the kernel. Our construction retains the geometric invariance, resulting in a kernel that is properly invariant under change of coordinates in the model parameter space. Experiments on detecting cognitive decline show that classifiers based on the proposed kernel out-perform those based on generative models and other feature extraction routines, and on Fisher kernels that use the identity in place of the Fisher information.
6 0.051874813 177 nips-2008-Particle Filter-based Policy Gradient in POMDPs
7 0.045264628 152 nips-2008-Non-stationary dynamic Bayesian networks
8 0.039811369 70 nips-2008-Efficient Inference in Phylogenetic InDel Trees
9 0.038843796 202 nips-2008-Robust Regression and Lasso
10 0.03870054 139 nips-2008-Modeling the effects of memory on human online sentence processing with particle filters
11 0.036793333 136 nips-2008-Model selection and velocity estimation using novel priors for motion patterns
12 0.035449453 233 nips-2008-The Gaussian Process Density Sampler
13 0.03407646 228 nips-2008-Support Vector Machines with a Reject Option
14 0.031767756 77 nips-2008-Evaluating probabilities under high-dimensional latent variable models
15 0.031117406 230 nips-2008-Temporal Difference Based Actor Critic Learning - Convergence and Neural Implementation
16 0.030033162 118 nips-2008-Learning Transformational Invariants from Natural Movies
17 0.029533261 4 nips-2008-A Scalable Hierarchical Distributed Language Model
18 0.028654879 201 nips-2008-Robust Near-Isometric Matching via Structured Learning of Graphical Models
19 0.028641008 117 nips-2008-Learning Taxonomies by Dependence Maximization
20 0.028041102 6 nips-2008-A ``Shape Aware'' Model for semi-supervised Learning of Objects and its Context
topicId topicWeight
[(0, -0.094), (1, -0.003), (2, 0.034), (3, -0.012), (4, 0.034), (5, -0.049), (6, 0.011), (7, 0.052), (8, 0.015), (9, -0.03), (10, -0.034), (11, 0.018), (12, 0.108), (13, -0.051), (14, 0.04), (15, 0.077), (16, 0.009), (17, -0.071), (18, 0.018), (19, 0.001), (20, -0.033), (21, 0.117), (22, -0.137), (23, 0.001), (24, -0.155), (25, -0.039), (26, -0.048), (27, -0.031), (28, 0.062), (29, 0.078), (30, 0.079), (31, 0.102), (32, 0.165), (33, 0.025), (34, 0.031), (35, 0.017), (36, -0.091), (37, -0.134), (38, -0.025), (39, -0.0), (40, -0.058), (41, 0.081), (42, 0.051), (43, 0.02), (44, -0.071), (45, 0.013), (46, 0.093), (47, 0.004), (48, -0.164), (49, -0.056)]
simIndex simValue paperId paperTitle
same-paper 1 0.93512052 18 nips-2008-An Efficient Sequential Monte Carlo Algorithm for Coalescent Clustering
Author: Dilan Gorur, Yee W. Teh
Abstract: We propose an efficient sequential Monte Carlo inference scheme for the recently proposed coalescent clustering model [1]. Our algorithm has a quadratic runtime while those in [1] is cubic. In experiments, we were surprised to find that in addition to being more efficient, it is also a better sequential Monte Carlo sampler than the best in [1], when measured in terms of variance of estimated likelihood and effective sample size. 1
2 0.65977371 235 nips-2008-The Infinite Hierarchical Factor Regression Model
Author: Piyush Rai, Hal Daume
Abstract: We propose a nonparametric Bayesian factor regression model that accounts for uncertainty in the number of factors, and the relationship between factors. To accomplish this, we propose a sparse variant of the Indian Buffet Process and couple this with a hierarchical model over factors, based on Kingman’s coalescent. We apply this model to two problems (factor analysis and factor regression) in gene-expression data analysis. 1
3 0.64035457 11 nips-2008-A spatially varying two-sample recombinant coalescent, with applications to HIV escape response
Author: Alexander Braunstein, Zhi Wei, Shane T. Jensen, Jon D. Mcauliffe
Abstract: Statistical evolutionary models provide an important mechanism for describing and understanding the escape response of a viral population under a particular therapy. We present a new hierarchical model that incorporates spatially varying mutation and recombination rates at the nucleotide level. It also maintains separate parameters for treatment and control groups, which allows us to estimate treatment effects explicitly. We use the model to investigate the sequence evolution of HIV populations exposed to a recently developed antisense gene therapy, as well as a more conventional drug therapy. The detection of biologically relevant and plausible signals in both therapy studies demonstrates the effectiveness of the method. 1
4 0.46288866 152 nips-2008-Non-stationary dynamic Bayesian networks
Author: Joshua W. Robinson, Alexander J. Hartemink
Abstract: A principled mechanism for identifying conditional dependencies in time-series data is provided through structure learning of dynamic Bayesian networks (DBNs). An important assumption of DBN structure learning is that the data are generated by a stationary process—an assumption that is not true in many important settings. In this paper, we introduce a new class of graphical models called non-stationary dynamic Bayesian networks, in which the conditional dependence structure of the underlying data-generation process is permitted to change over time. Non-stationary dynamic Bayesian networks represent a new framework for studying problems in which the structure of a network is evolving over time. We define the non-stationary DBN model, present an MCMC sampling algorithm for learning the structure of the model from time-series data under different assumptions, and demonstrate the effectiveness of the algorithm on both simulated and biological data. 1
5 0.43633983 9 nips-2008-A mixture model for the evolution of gene expression in non-homogeneous datasets
Author: Gerald Quon, Yee W. Teh, Esther Chan, Timothy Hughes, Michael Brudno, Quaid D. Morris
Abstract: We address the challenge of assessing conservation of gene expression in complex, non-homogeneous datasets. Recent studies have demonstrated the success of probabilistic models in studying the evolution of gene expression in simple eukaryotic organisms such as yeast, for which measurements are typically scalar and independent. Models capable of studying expression evolution in much more complex organisms such as vertebrates are particularly important given the medical and scientific interest in species such as human and mouse. We present Brownian Factor Phylogenetic Analysis, a statistical model that makes a number of significant extensions to previous models to enable characterization of changes in expression among highly complex organisms. We demonstrate the efficacy of our method on a microarray dataset profiling diverse tissues from multiple vertebrate species. We anticipate that the model will be invaluable in the study of gene expression patterns in other diverse organisms as well, such as worms and insects. 1
6 0.38317209 70 nips-2008-Efficient Inference in Phylogenetic InDel Trees
7 0.36324769 236 nips-2008-The Mondrian Process
8 0.35109451 234 nips-2008-The Infinite Factorial Hidden Markov Model
9 0.31542674 97 nips-2008-Hierarchical Fisher Kernels for Longitudinal Data
10 0.30145139 115 nips-2008-Learning Bounded Treewidth Bayesian Networks
11 0.29132837 116 nips-2008-Learning Hybrid Models for Image Annotation with Partially Labeled Data
12 0.28756288 188 nips-2008-QUIC-SVD: Fast SVD Using Cosine Trees
13 0.27744448 82 nips-2008-Fast Computation of Posterior Mode in Multi-Level Hierarchical Models
14 0.27689311 33 nips-2008-Bayesian Model of Behaviour in Economic Games
15 0.25917467 26 nips-2008-Analyzing human feature learning as nonparametric Bayesian inference
16 0.25871274 233 nips-2008-The Gaussian Process Density Sampler
17 0.24426153 4 nips-2008-A Scalable Hierarchical Distributed Language Model
18 0.24401796 77 nips-2008-Evaluating probabilities under high-dimensional latent variable models
19 0.2402236 186 nips-2008-Probabilistic detection of short events, with application to critical care monitoring
20 0.23728988 83 nips-2008-Fast High-dimensional Kernel Summations Using the Monte Carlo Multipole Method
topicId topicWeight
[(6, 0.025), (7, 0.035), (12, 0.037), (28, 0.115), (57, 0.063), (59, 0.016), (66, 0.496), (71, 0.022), (77, 0.02), (78, 0.012), (83, 0.048)]
simIndex simValue paperId paperTitle
same-paper 1 0.73495245 18 nips-2008-An Efficient Sequential Monte Carlo Algorithm for Coalescent Clustering
Author: Dilan Gorur, Yee W. Teh
Abstract: We propose an efficient sequential Monte Carlo inference scheme for the recently proposed coalescent clustering model [1]. Our algorithm has a quadratic runtime while those in [1] is cubic. In experiments, we were surprised to find that in addition to being more efficient, it is also a better sequential Monte Carlo sampler than the best in [1], when measured in terms of variance of estimated likelihood and effective sample size. 1
2 0.49149057 95 nips-2008-Grouping Contours Via a Related Image
Author: Praveen Srinivasan, Liming Wang, Jianbo Shi
Abstract: Contours have been established in the biological and computer vision literature as a compact yet descriptive representation of object shape. While individual contours provide structure, they lack the large spatial support of region segments (which lack internal structure). We present a method for further grouping of contours in an image using their relationship to the contours of a second, related image. Stereo, motion, and similarity all provide cues that can aid this task; contours that have similar transformations relating them to their matching contours in the second image likely belong to a single group. To find matches for contours, we rely only on shape, which applies directly to all three modalities without modification, in contrast to the specialized approaches developed for each independently. Visually salient contours are extracted in each image, along with a set of candidate transformations for aligning subsets of them. For each transformation, groups of contours with matching shape across the two images are identified to provide a context for evaluating matches of individual contour points across the images. The resulting contexts of contours are used to perform a final grouping on contours in the original image while simultaneously finding matches in the related image, again by shape matching. We demonstrate grouping results on image pairs consisting of stereo, motion, and similar images. Our method also produces qualitatively better results against a baseline method that does not use the inferred contexts. 1
3 0.44812933 245 nips-2008-Unlabeled data: Now it helps, now it doesn't
Author: Aarti Singh, Robert Nowak, Xiaojin Zhu
Abstract: Empirical evidence shows that in favorable situations semi-supervised learning (SSL) algorithms can capitalize on the abundance of unlabeled training data to improve the performance of a learning task, in the sense that fewer labeled training data are needed to achieve a target error bound. However, in other situations unlabeled data do not seem to help. Recent attempts at theoretically characterizing SSL gains only provide a partial and sometimes apparently conflicting explanations of whether, and to what extent, unlabeled data can help. In this paper, we attempt to bridge the gap between the practice and theory of semi-supervised learning. We develop a finite sample analysis that characterizes the value of unlabeled data and quantifies the performance improvement of SSL compared to supervised learning. We show that there are large classes of problems for which SSL can significantly outperform supervised learning, in finite sample regimes and sometimes also in terms of error convergence rates.
4 0.30616283 164 nips-2008-On the Generalization Ability of Online Strongly Convex Programming Algorithms
Author: Sham M. Kakade, Ambuj Tewari
Abstract: This paper examines the generalization properties of online convex programming algorithms when the loss function is Lipschitz and strongly convex. Our main result is a sharp bound, that holds with high probability, on the excess risk of the output of an online algorithm in terms of the average regret. This allows one to use recent algorithms with logarithmic cumulative regret guarantees to achieve fast convergence rates for the excess risk with high probability. As a corollary, we characterize the convergence rate of P EGASOS (with high probability), a recently proposed method for solving the SVM optimization problem. 1
5 0.29168364 116 nips-2008-Learning Hybrid Models for Image Annotation with Partially Labeled Data
Author: Xuming He, Richard S. Zemel
Abstract: Extensive labeled data for image annotation systems, which learn to assign class labels to image regions, is difficult to obtain. We explore a hybrid model framework for utilizing partially labeled data that integrates a generative topic model for image appearance with discriminative label prediction. We propose three alternative formulations for imposing a spatial smoothness prior on the image labels. Tests of the new models and some baseline approaches on three real image datasets demonstrate the effectiveness of incorporating the latent structure. 1
6 0.28740636 176 nips-2008-Partially Observed Maximum Entropy Discrimination Markov Networks
7 0.2856575 26 nips-2008-Analyzing human feature learning as nonparametric Bayesian inference
8 0.28523621 197 nips-2008-Relative Performance Guarantees for Approximate Inference in Latent Dirichlet Allocation
9 0.28417453 246 nips-2008-Unsupervised Learning of Visual Sense Models for Polysemous Words
10 0.28376833 4 nips-2008-A Scalable Hierarchical Distributed Language Model
11 0.28327069 118 nips-2008-Learning Transformational Invariants from Natural Movies
12 0.28293294 205 nips-2008-Semi-supervised Learning with Weakly-Related Unlabeled Data : Towards Better Text Categorization
13 0.28288579 66 nips-2008-Dynamic visual attention: searching for coding length increments
14 0.28212786 42 nips-2008-Cascaded Classification Models: Combining Models for Holistic Scene Understanding
15 0.28191406 235 nips-2008-The Infinite Hierarchical Factor Regression Model
16 0.28181759 63 nips-2008-Dimensionality Reduction for Data in Multiple Feature Representations
17 0.28175008 113 nips-2008-Kernelized Sorting
18 0.28143612 231 nips-2008-Temporal Dynamics of Cognitive Control
19 0.28141379 138 nips-2008-Modeling human function learning with Gaussian processes
20 0.28105628 130 nips-2008-MCBoost: Multiple Classifier Boosting for Perceptual Co-clustering of Images and Visual Features