nips nips2012 nips2012-89 knowledge-graph by maker-knowledge-mining

89 nips-2012-Coupling Nonparametric Mixtures via Latent Dirichlet Processes


Source: pdf

Author: Dahua Lin, John W. Fisher

Abstract: Mixture distributions are often used to model complex data. In this paper, we develop a new method that jointly estimates mixture models over multiple data sets by exploiting the statistical dependencies between them. Specifically, we introduce a set of latent Dirichlet processes as sources of component models (atoms), and for each data set, we construct a nonparametric mixture model by combining sub-sampled versions of the latent DPs. Each mixture model may acquire atoms from different latent DPs, while each atom may be shared by multiple mixtures. This multi-to-multi association distinguishes the proposed method from previous ones that require the model structure to be a tree or a chain, allowing more flexible designs. We also derive a sampling algorithm that jointly infers the model parameters and present experiments on both document analysis and image modeling. 1

Reference: text


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 Specifically, we introduce a set of latent Dirichlet processes as sources of component models (atoms), and for each data set, we construct a nonparametric mixture model by combining sub-sampled versions of the latent DPs. [sent-6, score-0.55]

2 Each mixture model may acquire atoms from different latent DPs, while each atom may be shared by multiple mixtures. [sent-7, score-0.765]

3 This assumption does not hold in the cases with multiple groups of data, where samples in different groups are generally not exchangeable. [sent-15, score-0.202]

4 Our primary goal here is to describe multiple groups of data through coupled mixture models. [sent-28, score-0.224]

5 Sharing statistical properties across different groups allows for more reliable model estimation, especially 1 when the observed samples in each group are limited or noisy. [sent-29, score-0.188]

6 (2) The marginal distribution of atoms for each group remains a DP. [sent-32, score-0.232]

7 For example, the prior weight of a common atom can vary across groups. [sent-34, score-0.401]

8 Specifically, we express mixture models for each group as a stochastic combination over a set of latent DPs. [sent-40, score-0.319]

9 The multi-to-multi association between data groups and latent DPs provides much greater flexibility to model configurations, as opposed to prior work (we provide a detailed comparison in section 3. [sent-41, score-0.304]

10 α + (n − 1) α + (n − 1) (3) Here, θ /i denotes all component parameters except θi , K/i denotes the number of distinct atoms among them, and m/i (k) denotes the number of occurrences of the atom φk . [sent-61, score-0.512]

11 Then, with a probability proportional to m/i (k)f (xi ; φk ), we set θi = φk , and with a probability proportional to αf (xi ; B), we draw an new atom from B(·|xi ), which is the posterior parameter distribution given xi . [sent-70, score-0.365]

12 2 (4) H1 q11 H2 q21 q31 D1 q22 q32 D2 ✓1i D4 ✓3i x2i Hs B ML Groups ct zti k rtk Q ✓4i x3i n2 n1 ↵s q42 D3 ✓2i x1i Latent DPs xti Atoms 1 1 nt M x4i n3 Figure 2: The reformulated model for Gibbs n4 sampling contains latent DPs, groups of data, and atoms. [sent-87, score-1.023]

13 Each sample xti is attached a label zti that assigns it an atom φzti . [sent-88, score-0.746]

14 To generate zti , we draw a latent DP (from Mult(ct )) and choose a label therefrom. [sent-89, score-0.407]

15 In sampling, Hs is integrated out, resulting in mutual dependency between zti , as in the Chinese restaurant process. [sent-90, score-0.196]

16 Figure 1: This shows the graphical model of the coupled DP formulation on a case with four groups and two latent DPs. [sent-91, score-0.355]

17 Each mixture model Dt inherits atoms from Hs with a probability qts , resulting in Eq. [sent-92, score-0.498]

18 Given a sub-sampling probability q, one draws a binary value rk with Pr(rk = 1) = q for each atom φk to decide whether to retain it, resulting in a DP as Sq (D) k:rk =1 πk δφk ∼ DP(αqB). [sent-97, score-0.347]

19 Given D = k=1 πk δφk ∼ DP (αB), perturbing the locations of each atom fol∞ lowing a probabilistic transition kernel T also yields a new DP, given by T (D) k=1 πk δT (φk ) . [sent-100, score-0.325]

20 3 Coupled Nonparametric Mixture Models Our primary goal is to develop a joint formulation over group-wise DP mixture models where components are shared across different groups and the weights and parameters of shared components vary across groups. [sent-102, score-0.314]

21 The generative formulation is then described as follows: First, generate ML latent DPs independently, as Hs ∼ DP (αs B), for s = 1, . [sent-106, score-0.23]

22 (6) Second, generate M dependent DPs, each for a group of data, by combining the sub-sampled versions of the latent DPs through stochastic convex combination. [sent-110, score-0.302]

23 (7) Intuitively, for each group of data (say the t-th), we choose a subset of atoms from each latent source and bring them together to generate Dt . [sent-121, score-0.438]

24 Here, qts is the prior probability that an atom in Hs will be inherited by Dt . [sent-122, score-0.673]

25 Particularly, the atom parameter would be an adapted version from Tt (φk , ·) instead of φk itself, when the atom φk is inherited by Dt . [sent-126, score-0.716]

26 (8) Here, xt,i is the i-th data sample in the t-th group, and θt,i is the associated atom parameter. [sent-134, score-0.346]

27 (7) has Dt ∼ DP(βt B), with βt = ML s=1 αs qts . [sent-138, score-0.259]

28 Two models are strongly coupled, if there exists a subset of latent DPs, from which both inherit atoms with high probabilities, while their coupling is much weaker if the associated q-values are set differently. [sent-146, score-0.47]

29 Generally, higher values of αs lead to more atoms being associated with the data, resulting in finer clusters. [sent-150, score-0.187]

30 Another important factor is ML , the number of latent DPs. [sent-151, score-0.18]

31 A large number of latent DPs provides fine-grained control of the model configuration at the cost of increased complexity. [sent-152, score-0.18]

32 Our model allows the mixture model for each group to inherit from multiple sources, making it applicable to more general contexts. [sent-159, score-0.192]

33 Though motivated differently, this construction can be reduced to a formulation in the form Dt = j∈Rt ctj Hj , where Rt is the subset of latent DPs used for Dt . [sent-167, score-0.266]

34 Consequently, the combination coefficients have to satisfy (ctj )j∈Rt ∼ Dir((αj )j∈Rt ), implying that the relative weights of two latent sources are restricted to be the same in all groups that inherit from both. [sent-172, score-0.37]

35 In contrast, the approach here allows the weights of latent DPs to vary across groups. [sent-173, score-0.233]

36 Also, SNΓP doesn’t allow atom parameters to vary across groups. [sent-174, score-0.378]

37 (2) Each group maintains a distribution over the latent DPs to choose from, which reflects the different contributions of these sources. [sent-178, score-0.246]

38 In particular, each group maintains indicators of whether particular atoms are inherited, and as a consequence, the ones that are deemed irrelevant are put out of scope. [sent-180, score-0.232]

39 (4) As there are multiple latent DPs, for each atom, there is uncertainty about where it comes from. [sent-181, score-0.18]

40 We have a specific step that takes this into account, which allows reassigning an atom to different sources. [sent-182, score-0.325]

41 Recall that there are M groups of data, and ML latent DPs to link between them. [sent-184, score-0.281]

42 Note here that the index k is a globally unique identifier of the atom, which would not be changed during atom relocation. [sent-190, score-0.325]

43 Since an atom may correspond to multiple data samples. [sent-191, score-0.325]

44 Instead of instantiating the parameter θti for each data sample xti , we attach to xti an indicator zti that associates the sample to a particular atom. [sent-192, score-0.681]

45 To facilitate the sampling process, for each atom φk , we maintain an indicator sk specifying the latent DP that contains it, and a set of counters {mtk }, where mtk equals the number of associated data samples in t-th group. [sent-194, score-0.669]

46 We also maintain a set Is for Hs (the s-th latent DP), which contains the indices of all atoms therein. [sent-195, score-0.346]

47 It consists of four steps: (1) Generate latent DPs: for each s = 1, . [sent-198, score-0.18]

48 (3) Decide inheritance: for each atom φk , we draw a binary variable rtk with Pr(rtk = 1) = qtsk to indicate whether φk is inherited by the t-th group. [sent-209, score-0.644]

49 Here sk is the index of the latent DP which φk is from. [sent-210, score-0.208]

50 (4) Generate data: to generate xti , we first choose a latent DP by drawing u ∼ Mult(ct1 , . [sent-211, score-0.466]

51 , ctML ), then draw an atom from Hu , using it to produce xti . [sent-214, score-0.625]

52 Based on this formulation, we derive the following Gibbs sampling steps to update the atom parameters and other hidden variables. [sent-215, score-0.349]

53 Recall that each data sample xti is associated with a label variable zti that indicates the atom accounting for xti . [sent-217, score-1.027]

54 To draw zti , we first have to choose a particular latent DP as the source (we denote the index of this DP by uti ). [sent-218, score-0.488]

55 Let z/ti denote all labels except zti , and rt denote the inheritance indicators. [sent-219, score-0.279]

56 Then, we get the likelihood of xti (with Hs integrated out) as p(xti |uti = s, rt , z/ti ) = 1 wst/i + qts αs m∗k/ti f (xti ; φk ) + qts αs f (xti ; B) . [sent-220, score-0.815]

57 (10) k∈Is :rtk =1 Here, m∗k/ti is the total number of samples associated with φk in all groups (except for xti ), wst/i = k∈Is :rtk =1 m∗k/ti , f (xti ; φk ) is the pdf at xti w. [sent-221, score-0.642]

58 , ctML ) are the group-specific prior over latent sources. [sent-230, score-0.203]

59 Once a latent DP is chosen (using the formula above), we can then draw a particular atom. [sent-231, score-0.22]

60 This is similar to the Chinese restaurant process: with a probability proportional to m∗k/ti f (xti ; φk ), we set zti = k, and with a probability proportional to qts αs f (xti ; B), we draw a new atom from B(·|xi ). [sent-232, score-0.82]

61 Only the atoms that is contained in Hs and has rtk = 1 (inherited by Dt ) can be drawn at this step. [sent-233, score-0.379]

62 We have to modify relevant quantities accordingly, such as mtk , ws , and Is , when a label zti is changed. [sent-234, score-0.252]

63 Moreover, when a new atom φk is created, it will be initially assigned to the latent DP that generates it (i. [sent-235, score-0.505]

64 If an atom φk is associated with some data in the t-th group, then we know for sure that it is inherited by Dt , and thus we can set rtk = 1. [sent-239, score-0.625]

65 However, if φk is not observed, it doesn’t imply rtk = 0. [sent-240, score-0.213]

66 For such an atom (suppose it is from Hs ), we have γ(τs/t , nt ) Pr(rtk = 1|others) qts · p(zt |rtk = 1, others) qts = = . [sent-241, score-0.878]

67 Pr(rtk = 0|others) (1 − qts ) · p(zt |rtk = 0, others) 1 − qts γ(τs/t + m∗k/t , nt ) (12) Here, τs/t = qts αs + k ∈Is −{k} m∗k/t and m∗k /t is the number of samples associated with k in all other groups (excluding the ones in the t-th group). [sent-242, score-0.934]

68 , ctML ) reflect the relative contribution of each latent DP to the t-th group. [sent-253, score-0.18]

69 In this model, each atom is almost surely from a unique latent DP (i. [sent-265, score-0.505]

70 This leads to an important question: How to we assign atoms to latent DPs? [sent-268, score-0.346]

71 Initially, an atom is assigned to the latent DP from which it is generated. [sent-269, score-0.505]

72 Here, we treat the assignment of each atom as a variable. [sent-271, score-0.325]

73 Consider an atom φk , with sk indicating its corresponding source DP. [sent-272, score-0.353]

74 Then, we have p(sk = j|others) = qts t:rtk =1 (1 − qts ). [sent-273, score-0.518]

75 (14) t:rtk =0 When an atom φk that was in Hs is reassigned to Hs , we have to move the index k from Is to Is . [sent-274, score-0.325]

76 To exploit the statistical dependency between groups, we further introduce a set of latent DPs to link between these mixtures, as described 6 above. [sent-284, score-0.18]

77 We perform experiments on several configurations, with different ways to connect between latent sources and data groups, as illustrated in Figure 3. [sent-288, score-0.216]

78 (1) Single Latent DP (S-LDP): there is only one latent DP connecting to all groups, with q-values set to 0. [sent-289, score-0.18]

79 Though with a structure similar to HDP, the formulation is actually different: HDP generates group-specific mixtures by using the latent DP as the base measure, while our model involves explicit sub-sampling. [sent-291, score-0.204]

80 (2) Multi Latent DP (M-LDP): there are two types of latent DPs – local and global ones. [sent-292, score-0.205]

81 The local latent DPs are introduced to help sharing statistical strength among the groups close to each other, so as to capture the intuition that papers published in consecutive years are more likely to share topics than those published in distant years. [sent-293, score-0.482]

82 The inheritance probability from a local latent DP Hs to Dt is set as qts = exp(−|t − s|/σ). [sent-294, score-0.545]

83 Also, recognizing that some topics may be shared across the entire corpus, we also introduce a global latent DP, from which every group inherit atoms with the same probability, which allows distant groups to be connected. [sent-295, score-0.655]

84 For comparison, we also consider another setting of q-values under the M-LDP structure: to set qts = I(|t − s| ≤ σ), that is to connect Dt and Hs only when |t − s| ≤ σ, with qts = 1. [sent-297, score-0.518]

85 We place a weak prior over αs for each latent DP, as αs ∼ Gamma(0. [sent-304, score-0.203]

86 This suggests that the sharing of statistical strength through local latent DPs improves the reliability of the estimation, especially when the training data are limited. [sent-316, score-0.248]

87 For example, the papers published in consecutive years tend to share lots of topics, however, the topics may not be as similar when you compare papers published recently to those a decade ago. [sent-319, score-0.185]

88 A set of local latent DPs may capture such relations more effectively than a single global one. [sent-320, score-0.205]

89 This way encourages each latent DP to be locally focused, while allowing the atoms therein to be shared across the entire corpus. [sent-323, score-0.388]

90 The SNΓP, instead, provides no mechanism to vary the contributions of the latent DPs, and has to make a hard limit of their spans to achieve locality. [sent-325, score-0.212]

91 Whereas this issue could be addressed through multiple level of latent nodes with different spans, it will increase the complexity, and thus the risk of overfitting. [sent-326, score-0.18]

92 For M-LDP, recall that we set qts = exp(−|t − s|/σ). [sent-327, score-0.259]

93 Optimal performance is attained when the choice of σ balances the need to share atoms and the desire to keep the latent DPs locally focused. [sent-330, score-0.346]

94 5, and a set of local latent DPs, each for a category. [sent-352, score-0.205]

95 The prior probability of inheriting from the corresponding latent DP is 1. [sent-353, score-0.203]

96 Whereas no prior knowledge about the similarity between categories is assumed, the latent DPs incorporated in this way still provide a mechanism for local coupling. [sent-356, score-0.25]

97 This is due to the more flexible way to configure local coupling that allows the weights of latent DPs to vary. [sent-360, score-0.255]

98 6 Conclusion We have presented a principled approach to modeling grouped data, where mixture models for different groups are coupled via a set of latent DPs. [sent-361, score-0.431]

99 The proposed framework allows each mixture model to inherit from multiple latent DPs, and each latent DP to contribute differently to different groups, thus providing great flexibility for model design. [sent-362, score-0.486]

100 Construction of dependent dirichlet processes based on poisson processes. [sent-427, score-0.202]


similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('dps', 0.465), ('dp', 0.363), ('atom', 0.325), ('xti', 0.26), ('qts', 0.259), ('rtk', 0.213), ('latent', 0.18), ('hs', 0.167), ('atoms', 0.166), ('zti', 0.161), ('ldp', 0.122), ('dt', 0.116), ('dirichlet', 0.112), ('uti', 0.107), ('groups', 0.101), ('hdp', 0.092), ('mtk', 0.091), ('ml', 0.087), ('inheritance', 0.081), ('ctml', 0.076), ('mixture', 0.073), ('inherited', 0.066), ('group', 0.066), ('hdps', 0.061), ('dir', 0.059), ('gamma', 0.059), ('perplexity', 0.056), ('sn', 0.054), ('docs', 0.054), ('sngp', 0.054), ('inherit', 0.053), ('coupling', 0.05), ('coupled', 0.05), ('ct', 0.049), ('topics', 0.047), ('boardwalk', 0.046), ('qtml', 0.046), ('swamp', 0.046), ('scene', 0.045), ('coast', 0.04), ('cts', 0.04), ('ocean', 0.04), ('draw', 0.04), ('published', 0.037), ('rt', 0.037), ('train', 0.036), ('sources', 0.036), ('restaurant', 0.035), ('nt', 0.035), ('xk', 0.034), ('papers', 0.032), ('vary', 0.032), ('chinese', 0.032), ('construction', 0.032), ('processes', 0.031), ('ctj', 0.03), ('dahua', 0.03), ('snowy', 0.03), ('sqts', 0.03), ('dependent', 0.03), ('david', 0.029), ('nonparametric', 0.029), ('poisson', 0.029), ('sk', 0.028), ('grouped', 0.027), ('ddp', 0.027), ('topic', 0.026), ('cvpr', 0.026), ('generate', 0.026), ('others', 0.026), ('tt', 0.025), ('zt', 0.025), ('halves', 0.025), ('local', 0.025), ('document', 0.024), ('design', 0.024), ('sampling', 0.024), ('formulation', 0.024), ('sharing', 0.023), ('prior', 0.023), ('categories', 0.022), ('csail', 0.022), ('rk', 0.022), ('scenes', 0.022), ('lin', 0.022), ('across', 0.021), ('countably', 0.021), ('shared', 0.021), ('component', 0.021), ('pr', 0.021), ('associated', 0.021), ('training', 0.02), ('mult', 0.02), ('sq', 0.02), ('whye', 0.02), ('yee', 0.02), ('sun', 0.02), ('image', 0.02), ('chung', 0.02), ('erik', 0.02)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 1.0000001 89 nips-2012-Coupling Nonparametric Mixtures via Latent Dirichlet Processes

Author: Dahua Lin, John W. Fisher

Abstract: Mixture distributions are often used to model complex data. In this paper, we develop a new method that jointly estimates mixture models over multiple data sets by exploiting the statistical dependencies between them. Specifically, we introduce a set of latent Dirichlet processes as sources of component models (atoms), and for each data set, we construct a nonparametric mixture model by combining sub-sampled versions of the latent DPs. Each mixture model may acquire atoms from different latent DPs, while each atom may be shared by multiple mixtures. This multi-to-multi association distinguishes the proposed method from previous ones that require the model structure to be a tree or a chain, allowing more flexible designs. We also derive a sampling algorithm that jointly infers the model parameters and present experiments on both document analysis and image modeling. 1

2 0.17354874 316 nips-2012-Small-Variance Asymptotics for Exponential Family Dirichlet Process Mixture Models

Author: Ke Jiang, Brian Kulis, Michael I. Jordan

Abstract: Sampling and variational inference techniques are two standard methods for inference in probabilistic models, but for many problems, neither approach scales effectively to large-scale data. An alternative is to relax the probabilistic model into a non-probabilistic formulation which has a scalable associated algorithm. This can often be fulfilled by performing small-variance asymptotics, i.e., letting the variance of particular distributions in the model go to zero. For instance, in the context of clustering, such an approach yields connections between the kmeans and EM algorithms. In this paper, we explore small-variance asymptotics for exponential family Dirichlet process (DP) and hierarchical Dirichlet process (HDP) mixture models. Utilizing connections between exponential family distributions and Bregman divergences, we derive novel clustering algorithms from the asymptotic limit of the DP and HDP mixtures that features the scalability of existing hard clustering methods as well as the flexibility of Bayesian nonparametric models. We focus on special cases of our analysis for discrete-data problems, including topic modeling, and we demonstrate the utility of our results by applying variants of our algorithms to problems arising in vision and document analysis. 1

3 0.16739686 315 nips-2012-Slice sampling normalized kernel-weighted completely random measure mixture models

Author: Nicholas Foti, Sinead Williamson

Abstract: A number of dependent nonparametric processes have been proposed to model non-stationary data with unknown latent dimensionality. However, the inference algorithms are often slow and unwieldy, and are in general highly specific to a given model formulation. In this paper, we describe a large class of dependent nonparametric processes, including several existing models, and present a slice sampler that allows efficient inference across this class of models. 1

4 0.15611005 355 nips-2012-Truncation-free Online Variational Inference for Bayesian Nonparametric Models

Author: Chong Wang, David M. Blei

Abstract: We present a truncation-free stochastic variational inference algorithm for Bayesian nonparametric models. While traditional variational inference algorithms require truncations for the model or the variational distribution, our method adapts model complexity on the fly. We studied our method with Dirichlet process mixture models and hierarchical Dirichlet process topic models on two large data sets. Our method performs better than previous stochastic variational inference algorithms. 1

5 0.14846376 60 nips-2012-Bayesian nonparametric models for ranked data

Author: Francois Caron, Yee W. Teh

Abstract: We develop a Bayesian nonparametric extension of the popular Plackett-Luce choice model that can handle an infinite number of choice items. Our framework is based on the theory of random atomic measures, with the prior specified by a gamma process. We derive a posterior characterization and a simple and effective Gibbs sampler for posterior simulation. We develop a time-varying extension of our model, and apply it to the New York Times lists of weekly bestselling books. 1

6 0.14514104 364 nips-2012-Weighted Likelihood Policy Search with Model Selection

7 0.14056598 251 nips-2012-On Lifting the Gibbs Sampling Algorithm

8 0.12828776 57 nips-2012-Bayesian estimation of discrete entropy with mixtures of stick-breaking priors

9 0.11113959 47 nips-2012-Augment-and-Conquer Negative Binomial Processes

10 0.10657474 354 nips-2012-Truly Nonparametric Online Variational Inference for Hierarchical Dirichlet Processes

11 0.085829526 172 nips-2012-Latent Graphical Model Selection: Efficient Methods for Locally Tree-like Graphs

12 0.083199114 124 nips-2012-Factorial LDA: Sparse Multi-Dimensional Text Models

13 0.082363285 166 nips-2012-Joint Modeling of a Matrix with Associated Text via Latent Binary Features

14 0.079369575 19 nips-2012-A Spectral Algorithm for Latent Dirichlet Allocation

15 0.075220555 274 nips-2012-Priors for Diversity in Generative Latent Variable Models

16 0.072396196 13 nips-2012-A Nonparametric Conjugate Prior Distribution for the Maximizing Argument of a Noisy Function

17 0.068327792 96 nips-2012-Density Propagation and Improved Bounds on the Partition Function

18 0.064493388 77 nips-2012-Complex Inference in Neural Circuits with Probabilistic Population Codes and Topic Models

19 0.064212188 126 nips-2012-FastEx: Hash Clustering with Exponential Families

20 0.060493052 192 nips-2012-Learning the Dependency Structure of Latent Factors


similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.16), (1, 0.04), (2, -0.024), (3, 0.013), (4, -0.189), (5, -0.061), (6, 0.005), (7, -0.026), (8, 0.132), (9, -0.022), (10, 0.092), (11, 0.078), (12, 0.054), (13, -0.112), (14, 0.031), (15, -0.091), (16, 0.0), (17, -0.001), (18, -0.019), (19, 0.021), (20, 0.029), (21, 0.015), (22, 0.078), (23, -0.04), (24, -0.054), (25, 0.011), (26, -0.061), (27, -0.035), (28, -0.021), (29, 0.097), (30, -0.057), (31, 0.052), (32, 0.04), (33, -0.05), (34, 0.069), (35, -0.03), (36, 0.029), (37, -0.008), (38, 0.08), (39, -0.115), (40, -0.18), (41, -0.07), (42, 0.016), (43, 0.123), (44, -0.008), (45, -0.068), (46, -0.063), (47, 0.033), (48, -0.088), (49, 0.015)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.93308711 89 nips-2012-Coupling Nonparametric Mixtures via Latent Dirichlet Processes

Author: Dahua Lin, John W. Fisher

Abstract: Mixture distributions are often used to model complex data. In this paper, we develop a new method that jointly estimates mixture models over multiple data sets by exploiting the statistical dependencies between them. Specifically, we introduce a set of latent Dirichlet processes as sources of component models (atoms), and for each data set, we construct a nonparametric mixture model by combining sub-sampled versions of the latent DPs. Each mixture model may acquire atoms from different latent DPs, while each atom may be shared by multiple mixtures. This multi-to-multi association distinguishes the proposed method from previous ones that require the model structure to be a tree or a chain, allowing more flexible designs. We also derive a sampling algorithm that jointly infers the model parameters and present experiments on both document analysis and image modeling. 1

2 0.71748465 315 nips-2012-Slice sampling normalized kernel-weighted completely random measure mixture models

Author: Nicholas Foti, Sinead Williamson

Abstract: A number of dependent nonparametric processes have been proposed to model non-stationary data with unknown latent dimensionality. However, the inference algorithms are often slow and unwieldy, and are in general highly specific to a given model formulation. In this paper, we describe a large class of dependent nonparametric processes, including several existing models, and present a slice sampler that allows efficient inference across this class of models. 1

3 0.67436773 47 nips-2012-Augment-and-Conquer Negative Binomial Processes

Author: Mingyuan Zhou, Lawrence Carin

Abstract: By developing data augmentation methods unique to the negative binomial (NB) distribution, we unite seemingly disjoint count and mixture models under the NB process framework. We develop fundamental properties of the models and derive efficient Gibbs sampling inference. We show that the gamma-NB process can be reduced to the hierarchical Dirichlet process with normalization, highlighting its unique theoretical, structural and computational advantages. A variety of NB processes with distinct sharing mechanisms are constructed and applied to topic modeling, with connections to existing algorithms, showing the importance of inferring both the NB dispersion and probability parameters. 1

4 0.58941621 60 nips-2012-Bayesian nonparametric models for ranked data

Author: Francois Caron, Yee W. Teh

Abstract: We develop a Bayesian nonparametric extension of the popular Plackett-Luce choice model that can handle an infinite number of choice items. Our framework is based on the theory of random atomic measures, with the prior specified by a gamma process. We derive a posterior characterization and a simple and effective Gibbs sampler for posterior simulation. We develop a time-varying extension of our model, and apply it to the New York Times lists of weekly bestselling books. 1

5 0.58669525 316 nips-2012-Small-Variance Asymptotics for Exponential Family Dirichlet Process Mixture Models

Author: Ke Jiang, Brian Kulis, Michael I. Jordan

Abstract: Sampling and variational inference techniques are two standard methods for inference in probabilistic models, but for many problems, neither approach scales effectively to large-scale data. An alternative is to relax the probabilistic model into a non-probabilistic formulation which has a scalable associated algorithm. This can often be fulfilled by performing small-variance asymptotics, i.e., letting the variance of particular distributions in the model go to zero. For instance, in the context of clustering, such an approach yields connections between the kmeans and EM algorithms. In this paper, we explore small-variance asymptotics for exponential family Dirichlet process (DP) and hierarchical Dirichlet process (HDP) mixture models. Utilizing connections between exponential family distributions and Bregman divergences, we derive novel clustering algorithms from the asymptotic limit of the DP and HDP mixtures that features the scalability of existing hard clustering methods as well as the flexibility of Bayesian nonparametric models. We focus on special cases of our analysis for discrete-data problems, including topic modeling, and we demonstrate the utility of our results by applying variants of our algorithms to problems arising in vision and document analysis. 1

6 0.58332533 59 nips-2012-Bayesian nonparametric models for bipartite graphs

7 0.58069688 251 nips-2012-On Lifting the Gibbs Sampling Algorithm

8 0.52034223 355 nips-2012-Truncation-free Online Variational Inference for Bayesian Nonparametric Models

9 0.48691693 220 nips-2012-Monte Carlo Methods for Maximum Margin Supervised Topic Models

10 0.4821319 57 nips-2012-Bayesian estimation of discrete entropy with mixtures of stick-breaking priors

11 0.4781692 124 nips-2012-Factorial LDA: Sparse Multi-Dimensional Text Models

12 0.4579168 294 nips-2012-Repulsive Mixtures

13 0.45729637 166 nips-2012-Joint Modeling of a Matrix with Associated Text via Latent Binary Features

14 0.45504919 287 nips-2012-Random function priors for exchangeable arrays with applications to graphs and relational data

15 0.44764709 274 nips-2012-Priors for Diversity in Generative Latent Variable Models

16 0.4320544 244 nips-2012-Nonconvex Penalization Using Laplace Exponents and Concave Conjugates

17 0.42547095 19 nips-2012-A Spectral Algorithm for Latent Dirichlet Allocation

18 0.41345158 26 nips-2012-A nonparametric variable clustering model

19 0.40435997 354 nips-2012-Truly Nonparametric Online Variational Inference for Hierarchical Dirichlet Processes

20 0.38935295 365 nips-2012-Why MCA? Nonlinear sparse coding with spike-and-slab prior for neurally plausible image encoding


similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(0, 0.076), (21, 0.032), (38, 0.099), (39, 0.024), (42, 0.033), (54, 0.023), (55, 0.012), (63, 0.303), (74, 0.054), (76, 0.103), (80, 0.103), (92, 0.044)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.75459242 89 nips-2012-Coupling Nonparametric Mixtures via Latent Dirichlet Processes

Author: Dahua Lin, John W. Fisher

Abstract: Mixture distributions are often used to model complex data. In this paper, we develop a new method that jointly estimates mixture models over multiple data sets by exploiting the statistical dependencies between them. Specifically, we introduce a set of latent Dirichlet processes as sources of component models (atoms), and for each data set, we construct a nonparametric mixture model by combining sub-sampled versions of the latent DPs. Each mixture model may acquire atoms from different latent DPs, while each atom may be shared by multiple mixtures. This multi-to-multi association distinguishes the proposed method from previous ones that require the model structure to be a tree or a chain, allowing more flexible designs. We also derive a sampling algorithm that jointly infers the model parameters and present experiments on both document analysis and image modeling. 1

2 0.70740747 78 nips-2012-Compressive Sensing MRI with Wavelet Tree Sparsity

Author: Chen Chen, Junzhou Huang

Abstract: In Compressive Sensing Magnetic Resonance Imaging (CS-MRI), one can reconstruct a MR image with good quality from only a small number of measurements. This can significantly reduce MR scanning time. According to structured sparsity theory, the measurements can be further reduced to O(K + log n) for tree-sparse data instead of O(K + K log n) for standard K-sparse data with length n. However, few of existing algorithms have utilized this for CS-MRI, while most of them model the problem with total variation and wavelet sparse regularization. On the other side, some algorithms have been proposed for tree sparse regularization, but few of them have validated the benefit of wavelet tree structure in CS-MRI. In this paper, we propose a fast convex optimization algorithm to improve CS-MRI. Wavelet sparsity, gradient sparsity and tree sparsity are all considered in our model for real MR images. The original complex problem is decomposed into three simpler subproblems then each of the subproblems can be efficiently solved with an iterative scheme. Numerous experiments have been conducted and show that the proposed algorithm outperforms the state-of-the-art CS-MRI algorithms, and gain better reconstructions results on real MR images than general tree based solvers or algorithms. 1

3 0.62467957 316 nips-2012-Small-Variance Asymptotics for Exponential Family Dirichlet Process Mixture Models

Author: Ke Jiang, Brian Kulis, Michael I. Jordan

Abstract: Sampling and variational inference techniques are two standard methods for inference in probabilistic models, but for many problems, neither approach scales effectively to large-scale data. An alternative is to relax the probabilistic model into a non-probabilistic formulation which has a scalable associated algorithm. This can often be fulfilled by performing small-variance asymptotics, i.e., letting the variance of particular distributions in the model go to zero. For instance, in the context of clustering, such an approach yields connections between the kmeans and EM algorithms. In this paper, we explore small-variance asymptotics for exponential family Dirichlet process (DP) and hierarchical Dirichlet process (HDP) mixture models. Utilizing connections between exponential family distributions and Bregman divergences, we derive novel clustering algorithms from the asymptotic limit of the DP and HDP mixtures that features the scalability of existing hard clustering methods as well as the flexibility of Bayesian nonparametric models. We focus on special cases of our analysis for discrete-data problems, including topic modeling, and we demonstrate the utility of our results by applying variants of our algorithms to problems arising in vision and document analysis. 1

4 0.55710268 47 nips-2012-Augment-and-Conquer Negative Binomial Processes

Author: Mingyuan Zhou, Lawrence Carin

Abstract: By developing data augmentation methods unique to the negative binomial (NB) distribution, we unite seemingly disjoint count and mixture models under the NB process framework. We develop fundamental properties of the models and derive efficient Gibbs sampling inference. We show that the gamma-NB process can be reduced to the hierarchical Dirichlet process with normalization, highlighting its unique theoretical, structural and computational advantages. A variety of NB processes with distinct sharing mechanisms are constructed and applied to topic modeling, with connections to existing algorithms, showing the importance of inferring both the NB dispersion and probability parameters. 1

5 0.54477805 355 nips-2012-Truncation-free Online Variational Inference for Bayesian Nonparametric Models

Author: Chong Wang, David M. Blei

Abstract: We present a truncation-free stochastic variational inference algorithm for Bayesian nonparametric models. While traditional variational inference algorithms require truncations for the model or the variational distribution, our method adapts model complexity on the fly. We studied our method with Dirichlet process mixture models and hierarchical Dirichlet process topic models on two large data sets. Our method performs better than previous stochastic variational inference algorithms. 1

6 0.54274493 172 nips-2012-Latent Graphical Model Selection: Efficient Methods for Locally Tree-like Graphs

7 0.54246718 142 nips-2012-Generalization Bounds for Domain Adaptation

8 0.54007316 342 nips-2012-The variational hierarchical EM algorithm for clustering hidden Markov models

9 0.5383091 192 nips-2012-Learning the Dependency Structure of Latent Factors

10 0.53735715 104 nips-2012-Dual-Space Analysis of the Sparse Linear Model

11 0.53727382 197 nips-2012-Learning with Recursive Perceptual Representations

12 0.53714806 83 nips-2012-Controlled Recognition Bounds for Visual Learning and Exploration

13 0.53567123 200 nips-2012-Local Supervised Learning through Space Partitioning

14 0.53498513 168 nips-2012-Kernel Latent SVM for Visual Recognition

15 0.53476167 65 nips-2012-Cardinality Restricted Boltzmann Machines

16 0.53461707 354 nips-2012-Truly Nonparametric Online Variational Inference for Hierarchical Dirichlet Processes

17 0.53410244 77 nips-2012-Complex Inference in Neural Circuits with Probabilistic Population Codes and Topic Models

18 0.53327143 274 nips-2012-Priors for Diversity in Generative Latent Variable Models

19 0.53317904 19 nips-2012-A Spectral Algorithm for Latent Dirichlet Allocation

20 0.53226173 229 nips-2012-Multimodal Learning with Deep Boltzmann Machines