nips nips2012 nips2012-315 knowledge-graph by maker-knowledge-mining

315 nips-2012-Slice sampling normalized kernel-weighted completely random measure mixture models


Source: pdf

Author: Nicholas Foti, Sinead Williamson

Abstract: A number of dependent nonparametric processes have been proposed to model non-stationary data with unknown latent dimensionality. However, the inference algorithms are often slow and unwieldy, and are in general highly specific to a given model formulation. In this paper, we describe a large class of dependent nonparametric processes, including several existing models, and present a slice sampler that allows efficient inference across this class of models. 1

Reference: text


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 edu Abstract A number of dependent nonparametric processes have been proposed to model non-stationary data with unknown latent dimensionality. [sent-7, score-0.243]

2 In this paper, we describe a large class of dependent nonparametric processes, including several existing models, and present a slice sampler that allows efficient inference across this class of models. [sent-9, score-1.047]

3 Recently, there has been increasing interest in dependent nonparametric processes [1], that extend existing nonparametric distributions to non-stationary data. [sent-14, score-0.351]

4 While a nonparametric process is a distribution over a single measure, a dependent nonparametric process is a distribution over a collection of measures, which may be associated with values in a covariate space. [sent-15, score-0.562]

5 The key property of a dependent nonparametric process is that the measure at each covariate value is marginally distributed according to a known nonparametric process. [sent-16, score-0.544]

6 A number of dependent nonparametric processes have been developed in the literature ([2] §6). [sent-17, score-0.243]

7 For example, the single-p DDP [1] defines a collection of Dirichlet processes with common atom sizes but variable atom locations. [sent-18, score-0.422]

8 The Spatial Normalized Gamma Process (SNGP) [4] defines a gamma process on an augmented space, such that at each covariate location a subset of the atoms are available. [sent-20, score-0.654]

9 This creates a dependent gamma process, that can be normalized to obtain a dependent Dirichlet process. [sent-21, score-0.436]

10 The kernel beta process (KBP) [5] defines a beta process on an augmented space, and at each covariate location modulates the atom sizes using a collection of kernels, to create a collection of dependent beta processes. [sent-22, score-0.952]

11 While there are many similarities between existing dependent nonparametric processes, most of the inference schemes that have been proposed are highly specific, and cannot be generally applied without significant modification. [sent-24, score-0.237]

12 First, in Section 2 we describe a general class of dependent nonparametric processes, based on defining completely random measures on an extended space. [sent-26, score-0.297]

13 Second, we develop a slice sampler that is applicable for all the dependent probability measures in this framework. [sent-28, score-0.925]

14 We compare our slice sampler to existing inference algorithms, and show that we are able to achieve superior performance over existing algorithms. [sent-29, score-0.83]

15 2 Constructing dependent nonparametric models using kernels In this section, we describe a general class of dependent completely random measures, that includes the kernel beta process as a special case. [sent-31, score-0.616]

16 We then describe the class of dependent normalized random measures obtained by normalizing these dependent completely random measures, and show that the SNGP lies in this framework. [sent-32, score-0.382]

17 Commonly used examples of CRMs include the gamma process, the generalized gamma process, the beta process, and the stable process. [sent-35, score-0.384]

18 While the construction herein applies for e any such L´ vy measure, we focus on the class of L´ vy measures that factorize as ν(dµ, dθ, dπ) = e e R0 (dµ)H0 (dθ)ν0 (dπ). [sent-41, score-0.223]

19 This corresponds to the class of homogeneous CRMs, where the size of an atom is independent of its location in Θ × X , and covers most CRMs encountered in the literature. [sent-42, score-0.225]

20 Though any such kernel may be used, for concreteness we only consider a box kernel and square exponential kernel defined as • Box kernel: K(x, µ) = 1 (||x − µ|| < W ), where we call W the width. [sent-45, score-0.54]

21 , for ||·|| a dissimilarity mea- Using the setup above we define a kernel-weighted CRM (KCRM) at a fixed covariate x ∈ X and for A measurable as ∞ Bx (A) = m=1 K(x, µm )πm δθm (A) (1) which is seen to be a CRM on Θ by the mapping theorem for Poisson processes [8]. [sent-47, score-0.227]

22 1 with, possibly, a deterministic continuous component 2 Taking ν0 to be the L´ vy measure of a beta process [10] results in the KBP. [sent-61, score-0.23]

23 Alternatively, taking ν0 e as the L´ vy measure of a gamma process, νGaP [11], and K(·, ·) as the box kernel we recover the e unnormalized form of the SNGP. [sent-62, score-0.492]

24 The most commonly used example of an NRM is the Dirichlet process, which can be obtained as a normalized gamma process [11]. [sent-66, score-0.272]

25 Other CRMs yield NRMs with different properties – for example a normalized generalized gamma process can have heavier tails than a Dirichlet process [13]. [sent-67, score-0.322]

26 Analogously, covariate-dependent NRMs can be used as priors for mixture models where the probability of being associated with a mixture component varies with the covariate [4, 14]. [sent-73, score-0.273]

27 For concreteness, we limit ourselves to a kernel gamma process (KGaP) which we denote as B ∼ KGaP(K, R0 , H0 , νGaP ), although the slice sampler can be adapted to any normalized KCRM. [sent-74, score-1.154]

28 Specifically, we observe data {(xj , yj )}N where xj ∈ X denotes the covariate of observation j j=1 and yj ∈ Rd denotes the quantities we wish to model. [sent-75, score-0.202]

29 Let x∗ denote the gth unique covariate value g among all the xj which induces a partition on the observations so that observation j belongs to group g if xj = x∗ . [sent-76, score-0.202]

30 3 A slice sampler for dependent NRMs The slice sampler of [16] allows us to perform inference in arbitrary NRMs. [sent-82, score-1.665]

31 We extend this slice sampler to perform inference in the class of dependent NRMs described in Sec. [sent-83, score-0.918]

32 The slice sampler can be used with any underlying CRM, but for simplicity we concentrate on an underlying gamma process, as described in Eq. [sent-86, score-0.924]

33 In the supplement we also derive a Rao-Blackwellized estimator of the predictive density for unobserved data using the output from the slice sampler. [sent-88, score-0.626]

34 3 Analogously to [16] we introduce a set of auxiliary slice variables – one for each data point. [sent-90, score-0.487]

35 Each data point can only belong to clusters corresponding to atoms larger than its slice variable. [sent-91, score-0.753]

36 The set of slice variables thus defines a minimum atom size that need be represented, ensuring a finite number of instantiated atoms. [sent-92, score-0.668]

37 Note that, in this case, an atom will exhibit different sizes at different covariate locations. [sent-94, score-0.382]

38 We refer to these sizes as the kernelized atom sizes, K(x∗ , µ)π, g obtained by applying a kernel K, evaluated at location x∗ , to the raw atom π. [sent-95, score-0.647]

39 Following [16], we g introduce a local slice variable ug,i . [sent-96, score-0.458]

40 5, we need to evaluate BT g , the total mass of the unnormalized CRM at each covariate value. [sent-100, score-0.225]

41 This involves summing over an infinite number of atoms – which we do not wish to represent. [sent-101, score-0.236]

42 Therefore, if we instantiate all atoms with raw size greater than L, we will include all atoms associated with occupied clusters. [sent-104, score-0.555]

43 For any value of L, there will be a finite number M of atoms above this threshold. [sent-105, score-0.236]

44 From these M raw atoms, we can obtain the kernelized atoms above the slice corresponding to a given data point. [sent-106, score-0.82]

45 We must obtain the remaining mass by marginalizing over all kernelized atoms that are below the slice (see the supplement). [sent-107, score-0.808]

46 ) the mass due to atoms that are not instantiated (i. [sent-109, score-0.321]

47 whose kernelized value is below the slice at all covariate locations) and, b. [sent-111, score-0.705]

48 atoms whose kernelized value is above the slice at at least one covariate location) 3 . [sent-114, score-0.941]

49 As we show in the supplement, the first term, a, corresponds to atoms (π, µ) where π < L, the mass of which can be written as L R0 (µ∗ ) µ∗ ∈X (1 − exp (−V T Kµ∗ π))ν0 (dπ) (6) 0 where V = (V1 , . [sent-115, score-0.312]

50 This can be evaluated numerically for many CRMs including gamma and generalized gamma processes [16]. [sent-119, score-0.36]

51 The second term, b, consists of realized atoms {(πk , µk )} such that K(x∗ , µk )πk < L at covariate x∗ . [sent-120, score-0.415]

52 For box kernels term b vanishes, and we have found that even for the square exponential kernel ignoring this term yields good results. [sent-122, score-0.314]

53 We parametrize the gamma distribution so that X ∼ Ga(a, b) has mean a/b and variance a/b2 If X were not bounded there would be a third term consisting of raw atoms > L that when kernelized fall below the slice everywhere. [sent-139, score-0.976]

54 There will also be a number of atoms with raw size πm > L that do not have associated data. [sent-145, score-0.294]

55 The number of such atoms is Poisson distributed with mean α A exp(−V T Kµ π)ν0 (dπ)R0 (dπ), where A = {(µ, π) : K(x∗ , µ)π > L, for some g} and which can be computed using the approach described g for Eq. [sent-146, score-0.236]

56 4 Experiments We evaluate the performance of the proposed slice sampler in the setting of covariate dependent density estimation. [sent-151, score-1.076]

57 We use both synthetic and real data sets in our experiments and compare the slice sampler to a Gibbs sampler for a finite approximation to the model (see the supplement for details of the model and sampler) and to the original SNGP sampler. [sent-154, score-1.174]

58 We assess the mixing characteristics of the sampler using the integrated autocorrelation time τ of the number of clusters used by the sampler at each iteration after a burn-in period, and by the predictive quality of the collective samples on held-out data. [sent-155, score-0.841]

59 14 for the slice sampler to achieve fair comparisons (see the supplement for the results using the Rao-Blackwellized estimator). [sent-210, score-0.83]

60 Since the SNGP is a special case of the normalized KGaP, we compare the finite and slice samplers, which are both conditional samplers, to the original marginal sampler proposed in [4]. [sent-222, score-0.834]

61 The implementation of the SNGP sampler we used also only allows for fixed component variances, so we fix all φk = 1/10, the true data generating precision. [sent-224, score-0.31]

62 We use the true kernel function that was used to generate the data as the kernel for the normalized KGaP model. [sent-225, score-0.294]

63 We ran the slice sampler for 10, 000 burn-in iterations and subsequently collected 5, 000 samples. [sent-226, score-0.768]

64 We truncated the finite version of the model to 100 atoms and ran the sampler for 5, 000 burn-in iterations and collected 5, 000 samples. [sent-227, score-0.546]

65 The SNGP sampler was run for 2, 000 burn-in iterations and 5, 000 samples were collected4 . [sent-228, score-0.31]

66 This might seem surprising, since the SNGP sampler uses sophisticated split-merge moves to improve mixing, which have no analogue in the slice sampler. [sent-232, score-0.797]

67 6 mixing performance is comparable, the average time per 100 iterations for the slice sampler was ∼ 10 seconds, for the SNGP sampler was ∼ 30 seconds and for the finite sampler was ∼ 200 seconds. [sent-234, score-1.413]

68 Even with only 100 atoms the finite sampler is much more expensive than the slice and SNGP5 samplers. [sent-235, score-1.004]

69 We also observe (Figure 1) that both the slice and finite samplers use essentially the true number of components underlying the data and that the SNGP sampler uses on average twice as many components. [sent-236, score-0.887]

70 The finite sampler finds a posterior mode with 13 clusters and rarely makes small moves from that mode. [sent-237, score-0.423]

71 The slice sampler explores modes with 10-17 clusters, but never makes large jumps away from this region. [sent-238, score-0.768]

72 The SNGP sampler explores the largest number of used clusters ranging from 23-40, however, it has not explored regions that use less clusters. [sent-239, score-0.369]

73 Figure 1 also depicts the distribution of the variable truncation level L over all samples in the slice sampler. [sent-240, score-0.506]

74 This suggests that a finite model that discards atoms with πk < 10−18 introduces negligible truncation error. [sent-241, score-0.284]

75 However, this value of L corresponds to ≈ 1018 atoms in the finite model which is computationally intractable. [sent-242, score-0.236]

76 In Figure 2 (Left) we plot estimates of the predictive density at each time stamp for the slice (a), finite (b) and SNGP (c) samplers. [sent-244, score-0.644]

77 However, the finite sampler seems unable to discard unneeded components. [sent-246, score-0.31]

78 The slice and SNGP samplers seem to both provide reasonable explanations for the distribution, with the slice sampler tending to provide smoother estimates. [sent-251, score-1.32]

79 2 Real data As well as providing an alternative inference method for existing models, our slice sampler can be used in a range of models that fall under the general class of KNRMs. [sent-253, score-0.831]

80 To demonstrate this, we use the finite and slice versions of our sampler to learn two kernel DPs, one using a box kernel, K(x, µ) = 1 (|x − µ| < 0. [sent-254, score-0.996]

81 The kernel was chosen to be somewhat comparable to the box kernel, however, this kernel allows the influence of an atom to diminish gradually as opposed to being constant. [sent-258, score-0.513]

82 We compare to the SNGP sampler for the box kernel model, but note that this sampler is not applicable to the exponential kernel model. [sent-259, score-0.989]

83 We compare these approaches on two real-world datasets: • Cosmic microwave background radiation (CMB)[20]: TT power spectrum measurements, η, from the cosmic microwave background radiation (CMB) at various ‘multipole moments’, denoted M . [sent-260, score-0.241]

84 For the CMB data, we consider only the first 600 multipole moments, where the variance is approximately constant, allowing us to compare the SNGP sampler to the other algorithms. [sent-267, score-0.383]

85 For the 5 Sampling the cluster means and assignments is the slowest step for the SNGP sampler taking about 3 seconds. [sent-275, score-0.31]

86 8 multipole moment TT power spectrum TT power spectrum 2 2 1 sampler finite 0 slice −1 0. [sent-282, score-0.936]

87 8 multipole moment Figure 2: Left: Predictive density at each time stamp for synthetic data using the slice (a), finite (b) and SNGP (c) samplers. [sent-286, score-0.667]

88 Middle: Mean and 95% CI of predictive distribution for all three samplers on CMB data using the box kernel. [sent-288, score-0.292]

89 motorcycle data, there was no regime of constant variance, so we only compare the slice and finite truncation samplers6 . [sent-290, score-0.597]

90 We see in Table 1 that the box kernel and the square exponential kernel produce similar results on the CMB data. [sent-293, score-0.402]

91 For the motorcycle data we see a noticeable difference between using the box and square exponential kernels where using the latter improves the held-out predictive likelihood and results in both samplers using fewer components on average. [sent-295, score-0.494]

92 Looking at the mean and 95% CI of the predictive distribution (middle) we see that when using the box kernel the SNGP actually fits the data the best. [sent-297, score-0.312]

93 This is most likely due to the fact that the SNGP is using more atoms than the slice or finite samplers. [sent-298, score-0.694]

94 We show that the square exponential kernel (right) gives much smoother estimates and appears to fit the data better, using the same number of atoms as were learned with the box kernel (see Table 1). [sent-299, score-0.638]

95 We note that the slice sampler took ∼ 20 seconds per 100 iterations while the finite sampler used ∼ 150 seconds. [sent-300, score-1.078]

96 5 Conclusion We presented the class of normalized kernel CRMs, a type of dependent normalized random measure. [sent-301, score-0.374]

97 We developed a slice sampler to perform inference on the infinite dimensional measure and compared this method with samplers for a finite approximation and for the SNGP. [sent-303, score-0.916]

98 We found that the slice sampler yields samples with competitive predictive accuracy at a fraction of the computational cost. [sent-304, score-0.852]

99 Incorporating reversible-jump moves [22] such as split-merge proposals should allow the slice sampler to explore larger regions of the parameter space with a limited decrease in computational efficiency. [sent-306, score-0.797]

100 A similar methodology may yield efficient inference algorithms for KCRMs such as the KBP, extending the existing slice sampler for the Indian Buffet Process [23]. [sent-307, score-0.81]


similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('slice', 0.458), ('sngp', 0.417), ('sampler', 0.31), ('atoms', 0.236), ('covariate', 0.179), ('atom', 0.171), ('crm', 0.163), ('vg', 0.161), ('gamma', 0.156), ('nrms', 0.146), ('crms', 0.145), ('cmb', 0.127), ('kernel', 0.114), ('box', 0.114), ('dependent', 0.107), ('samplers', 0.094), ('kgap', 0.091), ('motorcycle', 0.091), ('nonparametric', 0.088), ('bx', 0.085), ('predictive', 0.084), ('stamp', 0.08), ('vy', 0.076), ('kcrm', 0.073), ('multipole', 0.073), ('nrm', 0.073), ('beta', 0.072), ('kernelized', 0.068), ('normalized', 0.066), ('supplement', 0.062), ('lijoi', 0.059), ('xg', 0.059), ('clusters', 0.059), ('raw', 0.058), ('kbp', 0.055), ('microwave', 0.055), ('ounrm', 0.055), ('autocorrelation', 0.053), ('process', 0.05), ('measures', 0.05), ('finite', 0.049), ('processes', 0.048), ('truncation', 0.048), ('stamps', 0.048), ('mixture', 0.047), ('dirichlet', 0.047), ('mass', 0.046), ('poisson', 0.041), ('bt', 0.041), ('instantiated', 0.039), ('nite', 0.037), ('grif', 0.037), ('cosmic', 0.036), ('kcrms', 0.036), ('knrm', 0.036), ('radiation', 0.036), ('standardize', 0.036), ('vgng', 0.036), ('warwick', 0.036), ('synthetic', 0.034), ('square', 0.033), ('location', 0.033), ('sizes', 0.032), ('ga', 0.032), ('ddp', 0.032), ('sinead', 0.032), ('measure', 0.032), ('completely', 0.031), ('exp', 0.03), ('tt', 0.03), ('locations', 0.029), ('moves', 0.029), ('tj', 0.029), ('auxiliary', 0.029), ('jrss', 0.028), ('na', 0.027), ('exponential', 0.027), ('allocations', 0.026), ('wm', 0.026), ('carlo', 0.026), ('kernels', 0.026), ('components', 0.025), ('occupied', 0.025), ('posterior', 0.025), ('pr', 0.025), ('mixing', 0.025), ('pg', 0.024), ('concreteness', 0.024), ('mcmc', 0.024), ('ng', 0.024), ('observation', 0.023), ('spectrum', 0.023), ('inference', 0.022), ('buffet', 0.022), ('density', 0.022), ('class', 0.021), ('analogously', 0.021), ('monte', 0.02), ('existing', 0.02), ('indian', 0.02)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.99999964 315 nips-2012-Slice sampling normalized kernel-weighted completely random measure mixture models

Author: Nicholas Foti, Sinead Williamson

Abstract: A number of dependent nonparametric processes have been proposed to model non-stationary data with unknown latent dimensionality. However, the inference algorithms are often slow and unwieldy, and are in general highly specific to a given model formulation. In this paper, we describe a large class of dependent nonparametric processes, including several existing models, and present a slice sampler that allows efficient inference across this class of models. 1

2 0.17952155 60 nips-2012-Bayesian nonparametric models for ranked data

Author: Francois Caron, Yee W. Teh

Abstract: We develop a Bayesian nonparametric extension of the popular Plackett-Luce choice model that can handle an infinite number of choice items. Our framework is based on the theory of random atomic measures, with the prior specified by a gamma process. We derive a posterior characterization and a simple and effective Gibbs sampler for posterior simulation. We develop a time-varying extension of our model, and apply it to the New York Times lists of weekly bestselling books. 1

3 0.16739686 89 nips-2012-Coupling Nonparametric Mixtures via Latent Dirichlet Processes

Author: Dahua Lin, John W. Fisher

Abstract: Mixture distributions are often used to model complex data. In this paper, we develop a new method that jointly estimates mixture models over multiple data sets by exploiting the statistical dependencies between them. Specifically, we introduce a set of latent Dirichlet processes as sources of component models (atoms), and for each data set, we construct a nonparametric mixture model by combining sub-sampled versions of the latent DPs. Each mixture model may acquire atoms from different latent DPs, while each atom may be shared by multiple mixtures. This multi-to-multi association distinguishes the proposed method from previous ones that require the model structure to be a tree or a chain, allowing more flexible designs. We also derive a sampling algorithm that jointly infers the model parameters and present experiments on both document analysis and image modeling. 1

4 0.14533404 251 nips-2012-On Lifting the Gibbs Sampling Algorithm

Author: Deepak Venugopal, Vibhav Gogate

Abstract: First-order probabilistic models combine the power of first-order logic, the de facto tool for handling relational structure, with probabilistic graphical models, the de facto tool for handling uncertainty. Lifted probabilistic inference algorithms for them have been the subject of much recent research. The main idea in these algorithms is to improve the accuracy and scalability of existing graphical models’ inference algorithms by exploiting symmetry in the first-order representation. In this paper, we consider blocked Gibbs sampling, an advanced MCMC scheme, and lift it to the first-order level. We propose to achieve this by partitioning the first-order atoms in the model into a set of disjoint clusters such that exact lifted inference is polynomial in each cluster given an assignment to all other atoms not in the cluster. We propose an approach for constructing the clusters and show how it can be used to trade accuracy with computational complexity in a principled manner. Our experimental evaluation shows that lifted Gibbs sampling is superior to the propositional algorithm in terms of accuracy, scalability and convergence.

5 0.1252206 59 nips-2012-Bayesian nonparametric models for bipartite graphs

Author: Francois Caron

Abstract: We develop a novel Bayesian nonparametric model for random bipartite graphs. The model is based on the theory of completely random measures and is able to handle a potentially infinite number of nodes. We show that the model has appealing properties and in particular it may exhibit a power-law behavior. We derive a posterior characterization, a generative process for network growth, and a simple Gibbs sampler for posterior simulation. Our model is shown to be well fitted to several real-world social networks. 1

6 0.11866777 41 nips-2012-Ancestor Sampling for Particle Gibbs

7 0.1102744 111 nips-2012-Efficient Sampling for Bipartite Matching Problems

8 0.10773414 126 nips-2012-FastEx: Hash Clustering with Exponential Families

9 0.10071028 205 nips-2012-MCMC for continuous-time discrete-state systems

10 0.099252947 316 nips-2012-Small-Variance Asymptotics for Exponential Family Dirichlet Process Mixture Models

11 0.09809877 314 nips-2012-Slice Normalized Dynamic Markov Logic Networks

12 0.093845777 82 nips-2012-Continuous Relaxations for Discrete Hamiltonian Monte Carlo

13 0.092379406 349 nips-2012-Training sparse natural image models with a fast Gibbs sampler of an extended state space

14 0.086503431 107 nips-2012-Effective Split-Merge Monte Carlo Methods for Nonparametric Models of Sequential Data

15 0.086108647 339 nips-2012-The Time-Marginalized Coalescent Prior for Hierarchical Clustering

16 0.084426835 115 nips-2012-Efficient high dimensional maximum entropy modeling via symmetric partition functions

17 0.082168549 47 nips-2012-Augment-and-Conquer Negative Binomial Processes

18 0.070559695 287 nips-2012-Random function priors for exchangeable arrays with applications to graphs and relational data

19 0.069935106 244 nips-2012-Nonconvex Penalization Using Laplace Exponents and Concave Conjugates

20 0.068167157 294 nips-2012-Repulsive Mixtures


similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.154), (1, 0.041), (2, 0.003), (3, 0.036), (4, -0.178), (5, -0.086), (6, 0.01), (7, 0.044), (8, 0.104), (9, -0.002), (10, -0.0), (11, -0.015), (12, 0.007), (13, -0.129), (14, 0.028), (15, -0.183), (16, -0.048), (17, -0.114), (18, 0.012), (19, -0.082), (20, 0.109), (21, -0.007), (22, 0.022), (23, -0.074), (24, -0.122), (25, -0.001), (26, -0.075), (27, -0.068), (28, -0.067), (29, 0.008), (30, -0.062), (31, 0.05), (32, 0.025), (33, 0.044), (34, 0.094), (35, 0.061), (36, -0.012), (37, -0.006), (38, 0.009), (39, -0.146), (40, -0.19), (41, -0.093), (42, -0.026), (43, 0.017), (44, -0.135), (45, -0.036), (46, 0.014), (47, 0.016), (48, -0.035), (49, 0.036)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.94508374 315 nips-2012-Slice sampling normalized kernel-weighted completely random measure mixture models

Author: Nicholas Foti, Sinead Williamson

Abstract: A number of dependent nonparametric processes have been proposed to model non-stationary data with unknown latent dimensionality. However, the inference algorithms are often slow and unwieldy, and are in general highly specific to a given model formulation. In this paper, we describe a large class of dependent nonparametric processes, including several existing models, and present a slice sampler that allows efficient inference across this class of models. 1

2 0.69076347 251 nips-2012-On Lifting the Gibbs Sampling Algorithm

Author: Deepak Venugopal, Vibhav Gogate

Abstract: First-order probabilistic models combine the power of first-order logic, the de facto tool for handling relational structure, with probabilistic graphical models, the de facto tool for handling uncertainty. Lifted probabilistic inference algorithms for them have been the subject of much recent research. The main idea in these algorithms is to improve the accuracy and scalability of existing graphical models’ inference algorithms by exploiting symmetry in the first-order representation. In this paper, we consider blocked Gibbs sampling, an advanced MCMC scheme, and lift it to the first-order level. We propose to achieve this by partitioning the first-order atoms in the model into a set of disjoint clusters such that exact lifted inference is polynomial in each cluster given an assignment to all other atoms not in the cluster. We propose an approach for constructing the clusters and show how it can be used to trade accuracy with computational complexity in a principled manner. Our experimental evaluation shows that lifted Gibbs sampling is superior to the propositional algorithm in terms of accuracy, scalability and convergence.

3 0.68209982 59 nips-2012-Bayesian nonparametric models for bipartite graphs

Author: Francois Caron

Abstract: We develop a novel Bayesian nonparametric model for random bipartite graphs. The model is based on the theory of completely random measures and is able to handle a potentially infinite number of nodes. We show that the model has appealing properties and in particular it may exhibit a power-law behavior. We derive a posterior characterization, a generative process for network growth, and a simple Gibbs sampler for posterior simulation. Our model is shown to be well fitted to several real-world social networks. 1

4 0.6665675 89 nips-2012-Coupling Nonparametric Mixtures via Latent Dirichlet Processes

Author: Dahua Lin, John W. Fisher

Abstract: Mixture distributions are often used to model complex data. In this paper, we develop a new method that jointly estimates mixture models over multiple data sets by exploiting the statistical dependencies between them. Specifically, we introduce a set of latent Dirichlet processes as sources of component models (atoms), and for each data set, we construct a nonparametric mixture model by combining sub-sampled versions of the latent DPs. Each mixture model may acquire atoms from different latent DPs, while each atom may be shared by multiple mixtures. This multi-to-multi association distinguishes the proposed method from previous ones that require the model structure to be a tree or a chain, allowing more flexible designs. We also derive a sampling algorithm that jointly infers the model parameters and present experiments on both document analysis and image modeling. 1

5 0.6433382 60 nips-2012-Bayesian nonparametric models for ranked data

Author: Francois Caron, Yee W. Teh

Abstract: We develop a Bayesian nonparametric extension of the popular Plackett-Luce choice model that can handle an infinite number of choice items. Our framework is based on the theory of random atomic measures, with the prior specified by a gamma process. We derive a posterior characterization and a simple and effective Gibbs sampler for posterior simulation. We develop a time-varying extension of our model, and apply it to the New York Times lists of weekly bestselling books. 1

6 0.54675007 107 nips-2012-Effective Split-Merge Monte Carlo Methods for Nonparametric Models of Sequential Data

7 0.53308719 47 nips-2012-Augment-and-Conquer Negative Binomial Processes

8 0.51142484 349 nips-2012-Training sparse natural image models with a fast Gibbs sampler of an extended state space

9 0.50733417 205 nips-2012-MCMC for continuous-time discrete-state systems

10 0.49514028 299 nips-2012-Scalable imputation of genetic data with a discrete fragmentation-coagulation process

11 0.47392005 244 nips-2012-Nonconvex Penalization Using Laplace Exponents and Concave Conjugates

12 0.4738715 111 nips-2012-Efficient Sampling for Bipartite Matching Problems

13 0.46072754 41 nips-2012-Ancestor Sampling for Particle Gibbs

14 0.45674062 316 nips-2012-Small-Variance Asymptotics for Exponential Family Dirichlet Process Mixture Models

15 0.44295049 82 nips-2012-Continuous Relaxations for Discrete Hamiltonian Monte Carlo

16 0.44038871 219 nips-2012-Modelling Reciprocating Relationships with Hawkes Processes

17 0.43917996 287 nips-2012-Random function priors for exchangeable arrays with applications to graphs and relational data

18 0.43528366 294 nips-2012-Repulsive Mixtures

19 0.41987517 105 nips-2012-Dynamic Pruning of Factor Graphs for Maximum Marginal Prediction

20 0.41242504 126 nips-2012-FastEx: Hash Clustering with Exponential Families


similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(0, 0.032), (21, 0.019), (38, 0.066), (39, 0.108), (42, 0.016), (54, 0.015), (55, 0.016), (63, 0.028), (74, 0.042), (76, 0.191), (80, 0.306), (92, 0.043)]

similar papers list:

simIndex simValue paperId paperTitle

1 0.9390744 228 nips-2012-Multilabel Classification using Bayesian Compressed Sensing

Author: Ashish Kapoor, Raajay Viswanathan, Prateek Jain

Abstract: In this paper, we present a Bayesian framework for multilabel classiďŹ cation using compressed sensing. The key idea in compressed sensing for multilabel classiďŹ cation is to ďŹ rst project the label vector to a lower dimensional space using a random transformation and then learn regression functions over these projections. Our approach considers both of these components in a single probabilistic model, thereby jointly optimizing over compression as well as learning tasks. We then derive an efďŹ cient variational inference scheme that provides joint posterior distribution over all the unobserved labels. The two key beneďŹ ts of the model are that a) it can naturally handle datasets that have missing labels and b) it can also measure uncertainty in prediction. The uncertainty estimate provided by the model allows for active learning paradigms where an oracle provides information about labels that promise to be maximally informative for the prediction task. Our experiments show signiďŹ cant boost over prior methods in terms of prediction performance over benchmark datasets, both in the fully labeled and the missing labels case. Finally, we also highlight various useful active learning scenarios that are enabled by the probabilistic model. 1

2 0.93658781 314 nips-2012-Slice Normalized Dynamic Markov Logic Networks

Author: Tivadar Papai, Henry Kautz, Daniel Stefankovic

Abstract: Markov logic is a widely used tool in statistical relational learning, which uses a weighted first-order logic knowledge base to specify a Markov random field (MRF) or a conditional random field (CRF). In many applications, a Markov logic network (MLN) is trained in one domain, but used in a different one. This paper focuses on dynamic Markov logic networks, where the size of the discretized time-domain typically varies between training and testing. It has been previously pointed out that the marginal probabilities of truth assignments to ground atoms can change if one extends or reduces the domains of predicates in an MLN. We show that in addition to this problem, the standard way of unrolling a Markov logic theory into a MRF may result in time-inhomogeneity of the underlying Markov chain. Furthermore, even if these representational problems are not significant for a given domain, we show that the more practical problem of generating samples in a sequential conditional random field for the next slice relying on the samples from the previous slice has high computational cost in the general case, due to the need to estimate a normalization factor for each sample. We propose a new discriminative model, slice normalized dynamic Markov logic networks (SN-DMLN), that suffers from none of these issues. It supports efficient online inference, and can directly model influences between variables within a time slice that do not have a causal direction, in contrast with fully directed models (e.g., DBNs). Experimental results show an improvement in accuracy over previous approaches to online inference in dynamic Markov logic networks. 1

3 0.93242103 100 nips-2012-Discriminative Learning of Sum-Product Networks

Author: Robert Gens, Pedro Domingos

Abstract: Sum-product networks are a new deep architecture that can perform fast, exact inference on high-treewidth models. Only generative methods for training SPNs have been proposed to date. In this paper, we present the first discriminative training algorithms for SPNs, combining the high accuracy of the former with the representational power and tractability of the latter. We show that the class of tractable discriminative SPNs is broader than the class of tractable generative ones, and propose an efficient backpropagation-style algorithm for computing the gradient of the conditional log likelihood. Standard gradient descent suffers from the diffusion problem, but networks with many layers can be learned reliably using “hard” gradient descent, where marginal inference is replaced by MPE inference (i.e., inferring the most probable state of the non-evidence variables). The resulting updates have a simple and intuitive form. We test discriminative SPNs on standard image classification tasks. We obtain the best results to date on the CIFAR-10 dataset, using fewer features than prior methods with an SPN architecture that learns local image structure discriminatively. We also report the highest published test accuracy on STL-10 even though we only use the labeled portion of the dataset. 1

4 0.93223834 138 nips-2012-Fully Bayesian inference for neural models with negative-binomial spiking

Author: James Scott, Jonathan W. Pillow

Abstract: Characterizing the information carried by neural populations in the brain requires accurate statistical models of neural spike responses. The negative-binomial distribution provides a convenient model for over-dispersed spike counts, that is, responses with greater-than-Poisson variability. Here we describe a powerful data-augmentation framework for fully Bayesian inference in neural models with negative-binomial spiking. Our approach relies on a recently described latentvariable representation of the negative-binomial distribution, which equates it to a Polya-gamma mixture of normals. This framework provides a tractable, conditionally Gaussian representation of the posterior that can be used to design efficient EM and Gibbs sampling based algorithms for inference in regression and dynamic factor models. We apply the model to neural data from primate retina and show that it substantially outperforms Poisson regression on held-out data, and reveals latent structure underlying spike count correlations in simultaneously recorded spike trains. 1

5 0.92429906 121 nips-2012-Expectation Propagation in Gaussian Process Dynamical Systems

Author: Marc Deisenroth, Shakir Mohamed

Abstract: Rich and complex time-series data, such as those generated from engineering systems, financial markets, videos, or neural recordings are now a common feature of modern data analysis. Explaining the phenomena underlying these diverse data sets requires flexible and accurate models. In this paper, we promote Gaussian process dynamical systems as a rich model class that is appropriate for such an analysis. We present a new approximate message-passing algorithm for Bayesian state estimation and inference in Gaussian process dynamical systems, a nonparametric probabilistic generalization of commonly used state-space models. We derive our message-passing algorithm using Expectation Propagation and provide a unifying perspective on message passing in general state-space models. We show that existing Gaussian filters and smoothers appear as special cases within our inference framework, and that these existing approaches can be improved upon using iterated message passing. Using both synthetic and real-world data, we demonstrate that iterated message passing can improve inference in a wide range of tasks in Bayesian state estimation, thus leading to improved predictions and more effective decision making. 1

6 0.91005224 204 nips-2012-MAP Inference in Chains using Column Generation

7 0.90944952 67 nips-2012-Classification Calibration Dimension for General Multiclass Losses

8 0.8893609 248 nips-2012-Nonparanormal Belief Propagation (NPNBP)

9 0.88886607 218 nips-2012-Mixing Properties of Conditional Markov Chains with Unbounded Feature Functions

same-paper 10 0.88216794 315 nips-2012-Slice sampling normalized kernel-weighted completely random measure mixture models

11 0.8646636 251 nips-2012-On Lifting the Gibbs Sampling Algorithm

12 0.86333501 197 nips-2012-Learning with Recursive Perceptual Representations

13 0.8587535 171 nips-2012-Latent Coincidence Analysis: A Hidden Variable Model for Distance Metric Learning

14 0.85577881 200 nips-2012-Local Supervised Learning through Space Partitioning

15 0.85534853 279 nips-2012-Projection Retrieval for Classification

16 0.83388609 281 nips-2012-Provable ICA with Unknown Gaussian Noise, with Implications for Gaussian Mixtures and Autoencoders

17 0.83063716 207 nips-2012-Mandatory Leaf Node Prediction in Hierarchical Multilabel Classification

18 0.82955593 168 nips-2012-Kernel Latent SVM for Visual Recognition

19 0.82718909 323 nips-2012-Statistical Consistency of Ranking Methods in A Rank-Differentiable Probability Space

20 0.8256858 280 nips-2012-Proper losses for learning from partial labels