nips nips2012 nips2012-355 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Chong Wang, David M. Blei
Abstract: We present a truncation-free stochastic variational inference algorithm for Bayesian nonparametric models. While traditional variational inference algorithms require truncations for the model or the variational distribution, our method adapts model complexity on the fly. We studied our method with Dirichlet process mixture models and hierarchical Dirichlet process topic models on two large data sets. Our method performs better than previous stochastic variational inference algorithms. 1
Reference: text
sentIndex sentText sentNum sentScore
1 edu Abstract We present a truncation-free stochastic variational inference algorithm for Bayesian nonparametric models. [sent-6, score-0.608]
2 While traditional variational inference algorithms require truncations for the model or the variational distribution, our method adapts model complexity on the fly. [sent-7, score-0.841]
3 We studied our method with Dirichlet process mixture models and hierarchical Dirichlet process topic models on two large data sets. [sent-8, score-0.414]
4 Our method performs better than previous stochastic variational inference algorithms. [sent-9, score-0.595]
5 BNP models use posterior inference to adapt the model complexity to the data. [sent-11, score-0.196]
6 For example, as more data are observed, Dirichlet process (DP) mixture models [2] can create new mixture components and hierarchical Dirichlet process (HDP) topic models [3] can create new topics. [sent-12, score-0.527]
7 The most widely-used approaches are Markov chain Monte Carlo (MCMC) [4] and variational inference [5]. [sent-14, score-0.453]
8 The alternative is variational inference, which finds the member of a simplified family of distributions to approximate the true posterior [5, 10]. [sent-18, score-0.367]
9 This is generally faster than MCMC, and recent innovations let us use stochastic optimization to approximate posteriors with massive and streaming data [11, 12, 13]. [sent-19, score-0.204]
10 Unlike MCMC, however, variational inference algorithms for BNP models do not operate in an unbounded latent space. [sent-20, score-0.579]
11 Rather, they truncate the model or the variational distribution to a maximum model complexity [13, 14, 15, 16, 17, 18]. [sent-21, score-0.33]
12 In this paper, we develop a truncation-free stochastic variational inference algorithm for BNP models. [sent-23, score-0.568]
13 It is also unknown how to apply these to the stochastic variational inference setting we consider. [sent-28, score-0.568]
14 1 1 In particular, we present a new general inference algorithm, locally collapsed variational inference. [sent-29, score-0.729]
15 We demonstrate our algorithm on DP mixture models and HDP topic models with two large data sets, showing improved performance over truncated algorithms. [sent-31, score-0.298]
16 2 Truncation-free stochastic variational inference for BNP models Although our goal is to develop an efficient stochastic variational inference algorithm for BNP models, it is more succinct to describe our algorithm for a wider class of hierarchical Bayesian models [19]. [sent-32, score-1.243]
17 Let the global hidden variables be β with prior p(β | η) (η is the hyperparameter) and local variables for each data sample be zi (hidden) and xi (observed) for i = 1, . [sent-36, score-0.606]
18 The joint distribution of all variables (hidden and observed) factorizes as, p(β, z1:n , x1:n | η) = p(β | η) n i=1 p(xi , zi | β) = p(β | η) n i=1 p(xi | zi , β)p(zi | β). [sent-40, score-0.725]
19 For convenience, we assume global variables β are continuous and local variables zi are discrete. [sent-42, score-0.488]
20 , mixture models [20, 21], mixed-membership models [3, 22], latent factor models [23, 24] and tree-based hierarchical models [25]. [sent-46, score-0.317]
21 The mixture components are the distributions over the vocabulary θ and the mixture proportions π are represented with a stick-breaking process [26]. [sent-49, score-0.351]
22 The global variables β (π, θ) contain the proportions and components, and the local variables zi are the mixture assignments for each document xi . [sent-50, score-0.761]
23 For each document xi , (a) Draw mixture assignment zi ∼ Mult(π). [sent-54, score-0.584]
24 (b) For each word xij , draw the word xij ∼ Mult(θzi ). [sent-55, score-0.212]
25 1 Variational inference In variational inference we try to find a distribution in a simple family that is close to the true posterior. [sent-64, score-0.576]
26 We describe the mean-field approach, the simplest variational inference algorithm [5]. [sent-65, score-0.453]
27 (2) We call q(β) the global variational distribution and q(zi ) the local variational distribution. [sent-67, score-0.745]
28 We want to minimize the KL-divergence between this variational distribution and the true posterior. [sent-68, score-0.33]
29 Under the standard variational theory [5], this is equivalent to maximizing a lower bound of the log marginal likelihood of the observed data x1:n . [sent-69, score-0.406]
30 We obtain this bound with Jensen’s inequality, log p(x1:n | η) = log z1:n p(x1:n , z1:n , β | η)dβ ≥ Eq [log p(β) − log q(β) + 2 n i=1 log p(xi , zi |β) − log q(zi )] L(q). [sent-70, score-0.469]
31 ,1) 30 -4 20 10 -6 -8 log odds our method 0 -1 Figure 1: Graphical model for hierarchical Bayesian models with global hidden variables β, local hidden and observed variables zi and xi , i = 1, . [sent-75, score-0.784]
32 5 4 3 2 1 5 4 3 2 1 n 2 -1 mean-field 4 xi method -1 zi B: q(theta1)=Dirichlet(1,1,. [sent-80, score-0.438]
33 The locally collapsed approach does it correctly in both cases. [sent-95, score-0.276]
34 2: for iter = 1 to M do 3: for i = 1 to n do 4: Set local variational distribution q(zi ) ∝ exp Eq(β) [log p(xi , zi | β)] . [sent-99, score-0.739]
35 5: end for 6: Set global variational distribution q(β) ∝ exp Eq(z1:n ) [log p(x1:n , z1:n , β)] . [sent-100, score-0.412]
36 2: for iter = 1 to M do 3: for i = 1 to n do 4: Set local distribution q(zi ) ∝ Eq(β) [p(xi , zi | β)]. [sent-103, score-0.409]
37 ˆ 6: end for 7: Set global variational distribution q(β) ∝ exp Eq(z1:n ) [log p(x1:n , z1:n , β)] . [sent-105, score-0.412]
38 2 (with the optimal conditions given in [27]) gives q(β) ∝ exp Eq(z1:n ) [log p(x1:n , z1:n , β | η)] (4) q(zi ) ∝ exp Eq(β) [log p(xi , zi | β)] . [sent-111, score-0.377]
39 The factorization into global and local variables ensures that the local updates only depend on the global factors, which facilitates speed-ups like parallel [28] and stochastic variational inference [11, 12, 13, 29]. [sent-113, score-0.765]
40 In BNP models, however, the value of zi is potentially unbounded (e. [sent-114, score-0.402]
41 Thus we need to truncate the variational distribution [13, 14]. [sent-117, score-0.33]
42 Truncation is necessary in variational inference because of the mathematical structure of BNP models. [sent-118, score-0.453]
43 Moreover, it is difficult to grow the truncation in mean-field variational inference even in an ad-hoc way because it tends to underestimate posterior variance [30, 31]. [sent-119, score-0.62]
44 2 Locally collapsed variational inference We now describe locally collapsed variational inference, which mitigates the problem of underestimating posterior variance in mean-field variational inference. [sent-122, score-1.658]
45 The difference between traditional mean-field variational inference and our algorithm lies in the update of the local distribution q(zi ). [sent-125, score-0.518]
46 In our algorithm, it is q(zi ) ∝ Eq(β) [p(xi , zi | β)] , (6) as opposed to the mean-field update in Eq. [sent-126, score-0.38]
47 Because we collapse out the global variational distribution q(β) locally, we call this method locally collapsed variational inference. [sent-128, score-1.014]
48 3 In our implementation, we use a collapsed Gibbs sampler to sample from Equation 6. [sent-131, score-0.227]
49 To give an intuitive comparison of locally collapsed (Algorithm 2) and mean-field (Algorithm 1) variational inference, we consider a toy document clustering problem with vocabulary size V = 10. [sent-140, score-0.723]
50 Figure 2 shows the difference between mean-field and locally collapsed variational inference. [sent-165, score-0.606]
51 Alas, as for some other adaptations of variational inference, we do not yet have an airtight justification [32, 33, 34]. [sent-170, score-0.361]
52 Our algorithm is closely related to collapsed variational inference (CVI) [15, 16, 36, 32, 33]. [sent-176, score-0.652]
53 CVI applies variational inference to the marginalized model, integrating out the global hidden variable β. [sent-177, score-0.56]
54 In CVI, however, the optimization for each local variable zi depends on all other local variables, and this makes it difficult to apply it at large scale. [sent-179, score-0.417]
55 Our algorithm is akin to applying CVI for the intermediate model that treats q(β) as a prior and considers a single data point xi with its hidden structure zi . [sent-180, score-0.467]
56 Our algorithm is also related to the recently proposed a hybrid approach of using Gibbs sampling inside stochastic variational inference to take advantage of the sparsity in text documents in topic modeling [37]. [sent-182, score-0.741]
57 5, where all local hidden topic variables (for a document) are grouped together and the optimal q(zi ) is approximated by a Gibbs sampler. [sent-184, score-0.242]
58 We now extend our algorithm to stochastic variational inference, allowing us to fit approximate posteriors to massive data sets. [sent-187, score-0.499]
59 Also assume the global variational distribution q(β | λ) is in the same family as the prior q(β | η). [sent-191, score-0.381]
60 (10) ˆ 4 ¯ ¯ The term t(xi , zi ) is defined as t(xi , zi ) [t(xi , zi ); 1]. [sent-194, score-1.047]
61 A common strategy used in stochastic variational inference [12, 13] is to use a small batch of samples at each update. [sent-206, score-0.645]
62 3 Truncation-free stochastic variational inference for BNP models We have described locally collapsed variational inference in a general setting. [sent-211, score-1.333]
63 Our main interests in this paper are BNP models, and we now show how this approach leads to truncation-free variational algorithms. [sent-212, score-0.33]
64 The variational distribution for the global hidden variables, mixture components β and stick proportions π is ¯ q(θ, π | λ, u, v) = k q(θ | λk )q(¯k | uk , vk ), ¯ π where λk is the Dirichlet parameter and (uk , vk ) is the Beta parameter. [sent-218, score-0.831]
65 The sufficient statistic term t(xi , zi ) defined in Eq. [sent-219, score-0.349]
66 9 can be summarized as t(xi , zi )λkw = 1[zi =k] j 1[xij =w] ; t(xi , zi )uk = 1[zi =k] , t(xi , zi )vk = j=k+1 1[zi =j] , where 1[·] is the indicator function. [sent-220, score-1.047]
67 11 to update Dirichlet parameter λ and Beta parameter (u, v), ˆ λkw ← λkw + ρt (−λkw + η + nˆ(zi = k) j 1[xij =w] ) q uk ← uk + ρt (−uk + 1 + nˆ(zi = k)) q vk ← vk + ρt (−vk + a + n ˆ =k+1 q (zi = )). [sent-222, score-0.311]
68 Since the mixture assignment zi is the only hidden variable, ˆ we obtain its analytical form using Eq. [sent-231, score-0.506]
69 6, q(zi = k) ∝ p(xi | θk )p(zi = k | π)q(θk | λk )q(¯ )dθk d¯ π π = Γ( w w λkw ) Γ(λkw ) w Γ(λkw + j 1[xij =w]) uk Γ( w λkw +|xi |) uk +vk k−1 v =1 u +v , where |xi | is the document length and Γ(·) is the Gamma function. [sent-232, score-0.2]
70 This is analogous to the collapsed Gibbs sampling algorithm in DP mixture models [6]— whether or not exploring a new mixture component is initialized by one single sample. [sent-239, score-0.469]
71 The locally collapsed variational inference is powerful enough to trigger this. [sent-240, score-0.729]
72 In contrast, the approach presented here grows the truncation as a natural consequence of the inference algorithm and is easily adapted to stochastic inference. [sent-250, score-0.367]
73 3 Experiments We evaluate our methods on DP mixtures and HDP topic models, comparing them to truncation-based stochastic mean-field variational inference. [sent-251, score-0.639]
74 For HDP topic models, we set topic Dirichlet parameter η = 0. [sent-270, score-0.25]
75 ) For stochastic mean-field variational inference, we set the truncation level at 300 for both DP and HDP. [sent-274, score-0.544]
76 We remove components with fewer than 1 document for DP and topics with fewer than 1 word for HDP topic models each time when we process 20K documents. [sent-287, score-0.434]
77 5 -7 50 50 number 00 topics of New AtNew York TimesYork Times New York At the end Times the end 30 30 0100 0 40 40 0100 2 50 50 0 0 0 00 20 10 30 0 0 0 3 40 200 50 0 number of of topics number mixtures 4 0 10 1 30 50 0 - 5 - 50 1-0 5 0 8. [sent-298, score-0.297]
78 6 50 0 likelihood batchsize batchsize 20 20 0 0 10 10 0 0 likelihood -7 -7 -7 -7 . [sent-308, score-1.004]
79 2 heldout likelihood likelihood (a) corpora (a)(a) on both both corpora (b) on Nature on bothon corpora (b) on New York Times Nature Nature New York Times Nature New York Times 0 (a) on both corpora (a) on both corpora Nature Nature Figure 3: Results on DP mixtures. [sent-336, score-0.691]
80 ) The left of figure (b) shows the number of mixture components inferred after 10 hours; our method tends to give more mixtures. [sent-341, score-0.233]
81 Small batch sizes for the stochastic mean-field approach do not really work, resulting in very small number of mixtures. [sent-342, score-0.24]
82 (a) corpora (a) on bothon both corpora (b) on Nature method HDP topic models our method mean-field method 50 100 150 200 250 300 50 100 150 200 250 300 method Results. [sent-345, score-0.515]
83 Small batch sizes of the stochastic mean-field approach do not work well. [sent-349, score-0.24]
84 ) Our method tends to give more mixtures than the stochastic mean-field approach. [sent-352, score-0.242]
85 The stochastic mean-field approach shrinks the preset truncation; our approach does not need a truncation and grows the number of mixtures when data requires. [sent-353, score-0.405]
86 And small batch sizes of the stochastic mean-field approach do not work well. [sent-357, score-0.24]
87 Our method tends to give more topics than the stochastic mean-field approach. [sent-361, score-0.256]
88 The stochastic mean-field approach shrinks the preset truncation while our approach grows the number of topics when data requires. [sent-362, score-0.419]
89 This also explains that smaller batch sizes in stochastic mean-fields tend to work much worse—the first fewer samples might dominate the effect from the random initialization, leaving no room for later samples. [sent-366, score-0.24]
90 3 16 0 time time (hours) (hours) method our mean-field mean-field our method method more robust to batch sizes and gives better predictive performance most of time. [sent-373, score-0.264]
91 Small batch sizes for the stochastic mean-field approach do not really work, resulting in very small number of topics. [sent-377, score-0.24]
92 4 8 2 6 0 4 2 0 0 0 50 40 30 0 30 100 0 40 0 20 50 0 0 8 20 6 10 4 0 our method 0 15 0 10 50 10 batchsize batchsize time (hours) Figure 4: Results on HDP topic models. [sent-380, score-1.052]
93 Our approach is 7 Batchsize=10 Batchsize=100 0 number of topics 0 0 15 number of topics 50 2 50 100 150 200 250 300 20 10 0 30 0 2 40 00 0 50 10 30 0 0 40 200 50 0 0 30 0 stochastic method our mean-field mean-field our method method method mean-field 40 0 0 50 0 . [sent-382, score-0.389]
94 0 -8 batchsize batchsize batchsize batchsize (b) on New York (b) on New York Times Times At the end Batchsize=100 At the end New York New At the end TimesYork Times 0 0 41 00 0 50 0 20 0 30 0 20 0 Nature 10 . [sent-387, score-1.893]
95 ) [42] shows that importance sampling usually gives the correct ranking of different topic models but significantly underestimates the probability. [sent-411, score-0.222]
96 4 Conclusion and future work In this paper, we have developed truncation-free stochastic variational inference algorithms for Bayesian nonparametric models (BNP models) and applied them to two large datasets. [sent-412, score-0.644]
97 The nested Chinese restaurant process and Bayesian nonparametric inference of topic hierarchies. [sent-568, score-0.375]
98 LDA: A flexible large scale topic modeling package using variational inference in MapReduce. [sent-582, score-0.578]
99 Practical collapsed variational bayes inference for hierarchical dirichlet process. [sent-613, score-0.861]
100 A collapsed variational Bayesian inference algorithm for latent Dirichlet allocation. [sent-625, score-0.689]
wordName wordTfidf (topN-words)
[('batchsize', 0.45), ('zi', 0.349), ('bnp', 0.331), ('variational', 0.33), ('collapsed', 0.199), ('dp', 0.182), ('dirichlet', 0.174), ('hdp', 0.163), ('eld', 0.134), ('topic', 0.125), ('inference', 0.123), ('stochastic', 0.115), ('corpora', 0.104), ('kw', 0.104), ('mixture', 0.101), ('truncation', 0.099), ('eq', 0.098), ('hours', 0.089), ('topics', 0.083), ('locally', 0.077), ('batch', 0.077), ('york', 0.076), ('vk', 0.076), ('document', 0.072), ('mixtures', 0.069), ('cvi', 0.066), ('uk', 0.064), ('gibbs', 0.063), ('xi', 0.062), ('blei', 0.061), ('hidden', 0.056), ('kurihara', 0.055), ('xij', 0.054), ('massive', 0.054), ('unbounded', 0.053), ('likelihood', 0.052), ('word', 0.052), ('global', 0.051), ('preset', 0.05), ('nature', 0.049), ('sizes', 0.048), ('documents', 0.048), ('sato', 0.046), ('vocabulary', 0.045), ('bayesian', 0.045), ('mcmc', 0.044), ('shrinks', 0.042), ('nonparametric', 0.04), ('components', 0.039), ('zt', 0.039), ('proportions', 0.038), ('bothon', 0.038), ('timesyork', 0.038), ('posterior', 0.037), ('latent', 0.037), ('models', 0.036), ('times', 0.036), ('inferred', 0.035), ('hierarchical', 0.035), ('streaming', 0.035), ('welling', 0.034), ('buffet', 0.034), ('local', 0.034), ('conjugacy', 0.033), ('beta', 0.033), ('underestimates', 0.033), ('sticks', 0.033), ('mitigates', 0.033), ('dtest', 0.033), ('restaurant', 0.033), ('component', 0.032), ('tends', 0.031), ('hoffman', 0.031), ('indian', 0.031), ('teh', 0.031), ('update', 0.031), ('truncations', 0.031), ('chong', 0.031), ('adaptations', 0.031), ('end', 0.031), ('predictive', 0.03), ('grows', 0.03), ('chinese', 0.03), ('heldout', 0.029), ('gives', 0.028), ('sampler', 0.028), ('nested', 0.027), ('variables', 0.027), ('method', 0.027), ('process', 0.027), ('iter', 0.026), ('nt', 0.026), ('wang', 0.025), ('ths', 0.025), ('mult', 0.025), ('smyth', 0.024), ('initialization', 0.024), ('log', 0.024), ('lets', 0.024), ('gure', 0.023)]
simIndex simValue paperId paperTitle
same-paper 1 1.0000005 355 nips-2012-Truncation-free Online Variational Inference for Bayesian Nonparametric Models
Author: Chong Wang, David M. Blei
Abstract: We present a truncation-free stochastic variational inference algorithm for Bayesian nonparametric models. While traditional variational inference algorithms require truncations for the model or the variational distribution, our method adapts model complexity on the fly. We studied our method with Dirichlet process mixture models and hierarchical Dirichlet process topic models on two large data sets. Our method performs better than previous stochastic variational inference algorithms. 1
2 0.35180396 354 nips-2012-Truly Nonparametric Online Variational Inference for Hierarchical Dirichlet Processes
Author: Michael Bryant, Erik B. Sudderth
Abstract: Variational methods provide a computationally scalable alternative to Monte Carlo methods for large-scale, Bayesian nonparametric learning. In practice, however, conventional batch and online variational methods quickly become trapped in local optima. In this paper, we consider a nonparametric topic model based on the hierarchical Dirichlet process (HDP), and develop a novel online variational inference algorithm based on split-merge topic updates. We derive a simpler and faster variational approximation of the HDP, and show that by intelligently splitting and merging components of the variational posterior, we can achieve substantially better predictions of test data than conventional online and batch variational algorithms. For streaming analysis of large datasets where batch analysis is infeasible, we show that our split-merge updates better capture the nonparametric properties of the underlying model, allowing continual learning of new topics.
3 0.25134066 316 nips-2012-Small-Variance Asymptotics for Exponential Family Dirichlet Process Mixture Models
Author: Ke Jiang, Brian Kulis, Michael I. Jordan
Abstract: Sampling and variational inference techniques are two standard methods for inference in probabilistic models, but for many problems, neither approach scales effectively to large-scale data. An alternative is to relax the probabilistic model into a non-probabilistic formulation which has a scalable associated algorithm. This can often be fulfilled by performing small-variance asymptotics, i.e., letting the variance of particular distributions in the model go to zero. For instance, in the context of clustering, such an approach yields connections between the kmeans and EM algorithms. In this paper, we explore small-variance asymptotics for exponential family Dirichlet process (DP) and hierarchical Dirichlet process (HDP) mixture models. Utilizing connections between exponential family distributions and Bregman divergences, we derive novel clustering algorithms from the asymptotic limit of the DP and HDP mixtures that features the scalability of existing hard clustering methods as well as the flexibility of Bayesian nonparametric models. We focus on special cases of our analysis for discrete-data problems, including topic modeling, and we demonstrate the utility of our results by applying variants of our algorithms to problems arising in vision and document analysis. 1
4 0.20204386 129 nips-2012-Fast Variational Inference in the Conjugate Exponential Family
Author: James Hensman, Magnus Rattray, Neil D. Lawrence
Abstract: We present a general method for deriving collapsed variational inference algorithms for probabilistic models in the conjugate exponential family. Our method unifies many existing approaches to collapsed variational inference. Our collapsed variational inference leads to a new lower bound on the marginal likelihood. We exploit the information geometry of the bound to derive much faster optimization methods based on conjugate gradients for these models. Our approach is very general and is easily applied to any model where the mean field update equations have been derived. Empirically we show significant speed-ups for probabilistic inference using our bound. 1
5 0.15611005 89 nips-2012-Coupling Nonparametric Mixtures via Latent Dirichlet Processes
Author: Dahua Lin, John W. Fisher
Abstract: Mixture distributions are often used to model complex data. In this paper, we develop a new method that jointly estimates mixture models over multiple data sets by exploiting the statistical dependencies between them. Specifically, we introduce a set of latent Dirichlet processes as sources of component models (atoms), and for each data set, we construct a nonparametric mixture model by combining sub-sampled versions of the latent DPs. Each mixture model may acquire atoms from different latent DPs, while each atom may be shared by multiple mixtures. This multi-to-multi association distinguishes the proposed method from previous ones that require the model structure to be a tree or a chain, allowing more flexible designs. We also derive a sampling algorithm that jointly infers the model parameters and present experiments on both document analysis and image modeling. 1
6 0.15300331 124 nips-2012-Factorial LDA: Sparse Multi-Dimensional Text Models
7 0.14766328 126 nips-2012-FastEx: Hash Clustering with Exponential Families
8 0.12505659 220 nips-2012-Monte Carlo Methods for Maximum Margin Supervised Topic Models
9 0.12076014 19 nips-2012-A Spectral Algorithm for Latent Dirichlet Allocation
10 0.11877798 127 nips-2012-Fast Bayesian Inference for Non-Conjugate Gaussian Process Regression
11 0.11707328 77 nips-2012-Complex Inference in Neural Circuits with Probabilistic Population Codes and Topic Models
12 0.11043096 298 nips-2012-Scalable Inference of Overlapping Communities
13 0.11008543 47 nips-2012-Augment-and-Conquer Negative Binomial Processes
14 0.10362002 166 nips-2012-Joint Modeling of a Matrix with Associated Text via Latent Binary Features
15 0.10161258 342 nips-2012-The variational hierarchical EM algorithm for clustering hidden Markov models
16 0.10127519 58 nips-2012-Bayesian models for Large-scale Hierarchical Classification
17 0.097032771 228 nips-2012-Multilabel Classification using Bayesian Compressed Sensing
18 0.093545564 57 nips-2012-Bayesian estimation of discrete entropy with mixtures of stick-breaking priors
19 0.087943226 12 nips-2012-A Neural Autoregressive Topic Model
20 0.087902449 121 nips-2012-Expectation Propagation in Gaussian Process Dynamical Systems
topicId topicWeight
[(0, 0.211), (1, 0.083), (2, -0.008), (3, 0.035), (4, -0.34), (5, -0.08), (6, -0.009), (7, -0.035), (8, 0.164), (9, -0.129), (10, 0.193), (11, 0.152), (12, 0.023), (13, -0.001), (14, 0.02), (15, 0.012), (16, -0.035), (17, -0.049), (18, -0.015), (19, 0.08), (20, -0.026), (21, -0.017), (22, 0.014), (23, 0.041), (24, -0.056), (25, -0.04), (26, 0.008), (27, 0.041), (28, -0.049), (29, 0.067), (30, -0.093), (31, -0.083), (32, -0.066), (33, -0.073), (34, 0.021), (35, 0.019), (36, 0.133), (37, -0.071), (38, -0.074), (39, 0.049), (40, 0.072), (41, -0.061), (42, -0.042), (43, 0.015), (44, 0.094), (45, -0.065), (46, 0.029), (47, 0.027), (48, -0.059), (49, 0.048)]
simIndex simValue paperId paperTitle
same-paper 1 0.96398968 355 nips-2012-Truncation-free Online Variational Inference for Bayesian Nonparametric Models
Author: Chong Wang, David M. Blei
Abstract: We present a truncation-free stochastic variational inference algorithm for Bayesian nonparametric models. While traditional variational inference algorithms require truncations for the model or the variational distribution, our method adapts model complexity on the fly. We studied our method with Dirichlet process mixture models and hierarchical Dirichlet process topic models on two large data sets. Our method performs better than previous stochastic variational inference algorithms. 1
2 0.93910134 354 nips-2012-Truly Nonparametric Online Variational Inference for Hierarchical Dirichlet Processes
Author: Michael Bryant, Erik B. Sudderth
Abstract: Variational methods provide a computationally scalable alternative to Monte Carlo methods for large-scale, Bayesian nonparametric learning. In practice, however, conventional batch and online variational methods quickly become trapped in local optima. In this paper, we consider a nonparametric topic model based on the hierarchical Dirichlet process (HDP), and develop a novel online variational inference algorithm based on split-merge topic updates. We derive a simpler and faster variational approximation of the HDP, and show that by intelligently splitting and merging components of the variational posterior, we can achieve substantially better predictions of test data than conventional online and batch variational algorithms. For streaming analysis of large datasets where batch analysis is infeasible, we show that our split-merge updates better capture the nonparametric properties of the underlying model, allowing continual learning of new topics.
3 0.83641952 129 nips-2012-Fast Variational Inference in the Conjugate Exponential Family
Author: James Hensman, Magnus Rattray, Neil D. Lawrence
Abstract: We present a general method for deriving collapsed variational inference algorithms for probabilistic models in the conjugate exponential family. Our method unifies many existing approaches to collapsed variational inference. Our collapsed variational inference leads to a new lower bound on the marginal likelihood. We exploit the information geometry of the bound to derive much faster optimization methods based on conjugate gradients for these models. Our approach is very general and is easily applied to any model where the mean field update equations have been derived. Empirically we show significant speed-ups for probabilistic inference using our bound. 1
4 0.73494244 220 nips-2012-Monte Carlo Methods for Maximum Margin Supervised Topic Models
Author: Qixia Jiang, Jun Zhu, Maosong Sun, Eric P. Xing
Abstract: An effective strategy to exploit the supervising side information for discovering predictive topic representations is to impose discriminative constraints induced by such information on the posterior distributions under a topic model. This strategy has been adopted by a number of supervised topic models, such as MedLDA, which employs max-margin posterior constraints. However, unlike the likelihoodbased supervised topic models, of which posterior inference can be carried out using the Bayes’ rule, the max-margin posterior constraints have made Monte Carlo methods infeasible or at least not directly applicable, thereby limited the choice of inference algorithms to be based on variational approximation with strict mean field assumptions. In this paper, we develop two efficient Monte Carlo methods under much weaker assumptions for max-margin supervised topic models based on an importance sampler and a collapsed Gibbs sampler, respectively, in a convex dual formulation. We report thorough experimental results that compare our approach favorably against existing alternatives in both accuracy and efficiency.
5 0.63536549 316 nips-2012-Small-Variance Asymptotics for Exponential Family Dirichlet Process Mixture Models
Author: Ke Jiang, Brian Kulis, Michael I. Jordan
Abstract: Sampling and variational inference techniques are two standard methods for inference in probabilistic models, but for many problems, neither approach scales effectively to large-scale data. An alternative is to relax the probabilistic model into a non-probabilistic formulation which has a scalable associated algorithm. This can often be fulfilled by performing small-variance asymptotics, i.e., letting the variance of particular distributions in the model go to zero. For instance, in the context of clustering, such an approach yields connections between the kmeans and EM algorithms. In this paper, we explore small-variance asymptotics for exponential family Dirichlet process (DP) and hierarchical Dirichlet process (HDP) mixture models. Utilizing connections between exponential family distributions and Bregman divergences, we derive novel clustering algorithms from the asymptotic limit of the DP and HDP mixtures that features the scalability of existing hard clustering methods as well as the flexibility of Bayesian nonparametric models. We focus on special cases of our analysis for discrete-data problems, including topic modeling, and we demonstrate the utility of our results by applying variants of our algorithms to problems arising in vision and document analysis. 1
6 0.60474008 77 nips-2012-Complex Inference in Neural Circuits with Probabilistic Population Codes and Topic Models
7 0.59711105 89 nips-2012-Coupling Nonparametric Mixtures via Latent Dirichlet Processes
8 0.5901379 127 nips-2012-Fast Bayesian Inference for Non-Conjugate Gaussian Process Regression
9 0.58019722 124 nips-2012-Factorial LDA: Sparse Multi-Dimensional Text Models
10 0.55157053 359 nips-2012-Variational Inference for Crowdsourcing
11 0.54948705 47 nips-2012-Augment-and-Conquer Negative Binomial Processes
12 0.54869074 154 nips-2012-How They Vote: Issue-Adjusted Models of Legislative Behavior
13 0.53175652 126 nips-2012-FastEx: Hash Clustering with Exponential Families
14 0.52808994 37 nips-2012-Affine Independent Variational Inference
15 0.52650321 58 nips-2012-Bayesian models for Large-scale Hierarchical Classification
16 0.51874471 298 nips-2012-Scalable Inference of Overlapping Communities
17 0.51075345 345 nips-2012-Topic-Partitioned Multinetwork Embeddings
18 0.50911945 12 nips-2012-A Neural Autoregressive Topic Model
19 0.50477898 26 nips-2012-A nonparametric variable clustering model
20 0.49986792 342 nips-2012-The variational hierarchical EM algorithm for clustering hidden Markov models
topicId topicWeight
[(0, 0.092), (21, 0.033), (24, 0.158), (38, 0.108), (39, 0.028), (42, 0.029), (53, 0.012), (54, 0.02), (55, 0.03), (63, 0.025), (74, 0.084), (76, 0.111), (80, 0.139), (92, 0.041)]
simIndex simValue paperId paperTitle
1 0.86229134 310 nips-2012-Semiparametric Principal Component Analysis
Author: Fang Han, Han Liu
Abstract: We propose two new principal component analysis methods in this paper utilizing a semiparametric model. The according methods are named Copula Component Analysis (COCA) and Copula PCA. The semiparametric model assumes that, after unspecified marginally monotone transformations, the distributions are multivariate Gaussian. The COCA and Copula PCA accordingly estimate the leading eigenvectors of the correlation and covariance matrices of the latent Gaussian distribution. The robust nonparametric rank-based correlation coefficient estimator, Spearman’s rho, is exploited in estimation. We prove that, under suitable conditions, although the marginal distributions can be arbitrarily continuous, the COCA and Copula PCA estimators obtain fast estimation rates and are feature selection consistent in the setting where the dimension is nearly exponentially large relative to the sample size. Careful numerical experiments on the synthetic and real data are conducted to back up the theoretical results. We also discuss the relationship with the transelliptical component analysis proposed by Han and Liu (2012). 1
same-paper 2 0.83674216 355 nips-2012-Truncation-free Online Variational Inference for Bayesian Nonparametric Models
Author: Chong Wang, David M. Blei
Abstract: We present a truncation-free stochastic variational inference algorithm for Bayesian nonparametric models. While traditional variational inference algorithms require truncations for the model or the variational distribution, our method adapts model complexity on the fly. We studied our method with Dirichlet process mixture models and hierarchical Dirichlet process topic models on two large data sets. Our method performs better than previous stochastic variational inference algorithms. 1
3 0.80749214 317 nips-2012-Smooth-projected Neighborhood Pursuit for High-dimensional Nonparanormal Graph Estimation
Author: Tuo Zhao, Kathryn Roeder, Han Liu
Abstract: We introduce a new learning algorithm, named smooth-projected neighborhood pursuit, for estimating high dimensional undirected graphs. In particularly, we focus on the nonparanormal graphical model and provide theoretical guarantees for graph estimation consistency. In addition to new computational and theoretical analysis, we also provide an alternative view to analyze the tradeoff between computational efficiency and statistical error under a smoothing optimization framework. Numerical results on both synthetic and real datasets are provided to support our theory. 1
4 0.79654902 274 nips-2012-Priors for Diversity in Generative Latent Variable Models
Author: James T. Kwok, Ryan P. Adams
Abstract: Probabilistic latent variable models are one of the cornerstones of machine learning. They offer a convenient and coherent way to specify prior distributions over unobserved structure in data, so that these unknown properties can be inferred via posterior inference. Such models are useful for exploratory analysis and visualization, for building density models of data, and for providing features that can be used for later discriminative tasks. A significant limitation of these models, however, is that draws from the prior are often highly redundant due to i.i.d. assumptions on internal parameters. For example, there is no preference in the prior of a mixture model to make components non-overlapping, or in topic model to ensure that co-occurring words only appear in a small number of topics. In this work, we revisit these independence assumptions for probabilistic latent variable models, replacing the underlying i.i.d. prior with a determinantal point process (DPP). The DPP allows us to specify a preference for diversity in our latent variables using a positive definite kernel function. Using a kernel between probability distributions, we are able to define a DPP on probability measures. We show how to perform MAP inference with DPP priors in latent Dirichlet allocation and in mixture models, leading to better intuition for the latent variable representation and quantitatively improved unsupervised feature extraction, without compromising the generative aspects of the model. 1
5 0.79477823 172 nips-2012-Latent Graphical Model Selection: Efficient Methods for Locally Tree-like Graphs
Author: Anima Anandkumar, Ragupathyraj Valluvan
Abstract: Graphical model selection refers to the problem of estimating the unknown graph structure given observations at the nodes in the model. We consider a challenging instance of this problem when some of the nodes are latent or hidden. We characterize conditions for tractable graph estimation and develop efficient methods with provable guarantees. We consider the class of Ising models Markov on locally tree-like graphs, which are in the regime of correlation decay. We propose an efficient method for graph estimation, and establish its structural consistency −δη(η+1)−2 when the number of samples n scales as n = Ω(θmin log p), where θmin is the minimum edge potential, δ is the depth (i.e., distance from a hidden node to the nearest observed nodes), and η is a parameter which depends on the minimum and maximum node and edge potentials in the Ising model. The proposed method is practical to implement and provides flexibility to control the number of latent variables and the cycle lengths in the output graph. We also present necessary conditions for graph estimation by any method and show that our method nearly matches the lower bound on sample requirements. Keywords: Graphical model selection, latent variables, quartet methods, locally tree-like graphs. 1
6 0.78772122 168 nips-2012-Kernel Latent SVM for Visual Recognition
7 0.7823202 197 nips-2012-Learning with Recursive Perceptual Representations
8 0.78014761 316 nips-2012-Small-Variance Asymptotics for Exponential Family Dirichlet Process Mixture Models
9 0.77646661 191 nips-2012-Learning the Architecture of Sum-Product Networks Using Clustering on Variables
10 0.77526551 200 nips-2012-Local Supervised Learning through Space Partitioning
11 0.77446663 192 nips-2012-Learning the Dependency Structure of Latent Factors
12 0.77442503 229 nips-2012-Multimodal Learning with Deep Boltzmann Machines
13 0.77438104 104 nips-2012-Dual-Space Analysis of the Sparse Linear Model
14 0.77358311 342 nips-2012-The variational hierarchical EM algorithm for clustering hidden Markov models
15 0.77349788 83 nips-2012-Controlled Recognition Bounds for Visual Learning and Exploration
16 0.77113646 77 nips-2012-Complex Inference in Neural Circuits with Probabilistic Population Codes and Topic Models
17 0.76895207 65 nips-2012-Cardinality Restricted Boltzmann Machines
18 0.76717418 354 nips-2012-Truly Nonparametric Online Variational Inference for Hierarchical Dirichlet Processes
19 0.7671054 89 nips-2012-Coupling Nonparametric Mixtures via Latent Dirichlet Processes
20 0.76551819 234 nips-2012-Multiresolution analysis on the symmetric group