jmlr jmlr2013 jmlr2013-121 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Chong Wang, David M. Blei
Abstract: Mean-field variational methods are widely used for approximate posterior inference in many probabilistic models. In a typical application, mean-field methods approximately compute the posterior with a coordinate-ascent optimization algorithm. When the model is conditionally conjugate, the coordinate updates are easily derived and in closed form. However, many models of interest—like the correlated topic model and Bayesian logistic regression—are nonconjugate. In these models, mean-field methods cannot be directly applied and practitioners have had to develop variational algorithms on a case-by-case basis. In this paper, we develop two generic methods for nonconjugate models, Laplace variational inference and delta method variational inference. Our methods have several advantages: they allow for easily derived variational algorithms with a wide class of nonconjugate models; they extend and unify some of the existing algorithms that have been derived for specific models; and they work well on real-world data sets. We studied our methods on the correlated topic model, Bayesian logistic regression, and hierarchical Bayesian logistic regression. Keywords: variational inference, nonconjugate models, Laplace approximations, the multivariate delta method
Reference: text
sentIndex sentText sentNum sentScore
1 EDU Department of Computer Science Princeton University Princeton, NJ, 08540, USA Editor: Neil Lawrence Abstract Mean-field variational methods are widely used for approximate posterior inference in many probabilistic models. [sent-6, score-0.85]
2 In this paper, we develop two generic methods for nonconjugate models, Laplace variational inference and delta method variational inference. [sent-11, score-1.899]
3 Our methods have several advantages: they allow for easily derived variational algorithms with a wide class of nonconjugate models; they extend and unify some of the existing algorithms that have been derived for specific models; and they work well on real-world data sets. [sent-12, score-0.976]
4 Keywords: variational inference, nonconjugate models, Laplace approximations, the multivariate delta method 1 Introduction Mean-field variational inference lets us efficiently approximate posterior distributions in complex probabilistic models (Jordan et al. [sent-14, score-2.051]
5 ) For such models, which are called conditionally conjugate models, it is easy to derive a coordinate ascent algorithm that optimizes the parameters of the variational distribution (Beal, 2003; Bishop, 2006). [sent-29, score-0.803]
6 , 2010), which allow practitioners to define models of their data and immediately approximate the corresponding posterior with variational inference. [sent-33, score-0.671]
7 Such nonconjugate models1 include Bayesian logistic regression (Jaakkola and Jordan, 1997), Bayesian generalized linear models (Wells, 2001), discrete choice models (Braun and McAuliffe, 2010), Bayesian item response models (Clinton et al. [sent-35, score-0.769]
8 , 2004; Fox, 2010), and nonconjugate topic models (Blei and Lafferty, 2006, 2007). [sent-36, score-0.677]
9 Using variational inference in these settings requires algorithms tailored to the specific model at hand. [sent-37, score-0.769]
10 In this paper we develop two approaches to mean-field variational inference for a large class of nonconjugate models. [sent-40, score-1.205]
11 Formed another way, it is equivalent to using a multivariate delta approximation (Bickel and Doksum, 2007) of the variational objective. [sent-48, score-0.715]
12 Our methods significantly expand the class of models for which mean-field variational inference can be easily applied. [sent-52, score-0.798]
13 We studied our algorithms with three nonconjugate models: Bayesian logistic regression (Jaakkola and Jordan, 1997), hierarchical logistic regression (Gelman and Hill, 2007), and the correlated topic model (Blei and Lafferty, 2007). [sent-53, score-1.073]
14 Further, we found that Laplace variational inference usually outperforms delta method variational inference, both in terms of computation time and the fidelity of the approximate posterior. [sent-55, score-1.475]
15 There have been other efforts to examine generic variational inference in nonconjugate models. [sent-58, score-1.205]
16 (2012a) proposed a variational inference approach using stochastic search for nonconjugate models, approximating the intractable integrals with Monte Carlo methods. [sent-60, score-1.205]
17 (2012) proposed a nonparametric variational inference algorithm, which can be applied to nonconjugate models. [sent-62, score-1.205]
18 1006 VARIATIONAL I NFERENCE IN N ONCONJUGATE M ODELS Laplace approximations have been used in approximate inference in more complex models, though not in the context of mean-field variational inference. [sent-71, score-0.8]
19 Here we want to use them for variational inference, in a method that can be applied to a wider range of nonconjugate models. [sent-76, score-0.976]
20 Finally, we note that the delta method was first used in variational inference by Braun and McAuliffe (2010) in the context of the discrete choice model. [sent-77, score-0.923]
21 In Section 2 we review mean-field variational inference and define the class of nonconjugate models to which our algorithms apply. [sent-80, score-1.255]
22 In Section 3, we derive Laplace and delta-method variational inference and present our full algorithm for nonconjugate inference. [sent-81, score-1.205]
23 In variational inference, we approximate the posterior by positing a simple family of distributions over the latent variables q(θ, z) and then finding the member of that family which minimizes the Kullback-Leibler (KL) divergence to the true posterior (Jordan et al. [sent-88, score-0.804]
24 3 In this section we review variational inference and discuss mean-field variational inference for the class of conditionally conjugate models. [sent-90, score-1.65]
25 We then define a wider class of nonconjugate models for which mean-field variational inference is not as easily applied. [sent-91, score-1.255]
26 In the next section, we derive algorithms for performing mean-field variational inference in this larger class of models. [sent-92, score-0.748]
27 1 Mean-field Variational Inference Mean-field variational inference is simplest and most widely used variational inference method. [sent-94, score-1.496]
28 In mean-field variational inference we posit a fully factorized variational family, q(θ, z) = q(θ)q(z). [sent-95, score-1.267]
29 In this paper, we focus on mean-field variational inference where we minimize the KL divergence to the posterior. [sent-97, score-0.748]
30 We note that there are other kinds of variational inference, with more structured variational distributions or with alternative objective functions (Wainwright and Jordan, 2008; Barber, 2012). [sent-98, score-1.066]
31 In this paper, we use “variational inference” to indicate mean-field variational inference that minimizes the KL divergence. [sent-99, score-0.748]
32 Under the standard variational theory, minimizing the KL divergence between q(θ, z) and the posterior p(θ, z|x) is equivalent to maximizing a lower bound of the log marginal likelihood of the observed data x. [sent-103, score-0.689]
33 These conditions lead to the traditional coordinate ascent algorithm for variational inference. [sent-109, score-0.612]
34 Many applications of variational inference have been developed for this type of model (Bishop, 1999; Attias, 2000; Beal, 2003). [sent-122, score-0.769]
35 That setting arises in many practical models and does not permit closed-form updates or easy calculation of the variational objective. [sent-124, score-0.603]
36 We will develop generic variational inference algorithms for a wide class of nonconjugate models. [sent-125, score-1.205]
37 Traditional variational or Gibbs sampling methods cannot be easily used because the normal prior on the parameters θ is not conjugate to the Dirichlet(exp{θ}) likelihood. [sent-165, score-0.678]
38 We will develop two variational inference algorithms for this class: Laplace variational inference and delta method variational inference. [sent-176, score-2.19]
39 Both use coordinate ascent to optimize the variational parameters, iterating between updating q(θ) and q(z). [sent-177, score-0.657]
40 They differ in how they update the variational distribution of the nonconjugate variable q(θ). [sent-178, score-1.08]
41 In delta method variational inference, we apply Taylor approximations to approximate the variational objective in Equation 3 and then derive the corresponding updates. [sent-181, score-1.293]
42 Formed another way, it is equivalent to using a multivariate delta approximation (Bickel and Doksum, 2007) of the variational objective function. [sent-184, score-0.743]
43 The variational distribution of the nonconjugate variable q(θ) is a Gaussian; the variational distribution of the conjugate variable q(z) is in the same family as p(z | η(θ)). [sent-186, score-1.736]
44 Our algorithms are coordinate ascent algorithms, where we iterate between updating the nonconjugate variational distribution q(θ) and updating the conjugate variational distribution q(z). [sent-194, score-1.794]
45 Now we describe how we use Laplace approximations as part of a variational inference algorithm for more complex models. [sent-220, score-0.767]
46 2 Laplace Updates in Variational Inference We adapt the idea behind Laplace approximations to update the variational distribution q(θ). [sent-223, score-0.623]
47 The update in Equation 13 can be used in a coordinate ascent algorithm for a nonconjugate model. [sent-239, score-0.616]
48 2 Delta Method Variational Inference In Laplace variational inference, the variational distribution q(θ) Equation 13 is solely a function of ˆ θ, the maximum of f (θ) in Equation 11. [sent-253, score-1.057]
49 We approximate the variational objective L in Equation 3 and then optimize that approximation. [sent-256, score-0.6]
50 We set the variational distribution q(θ) to be a Gaussian N (µ, Σ), where the parameters are free variational parameters fit to optimize the variational objective. [sent-258, score-1.596]
51 In the coordinate update of q(θ), this is the function we optimize with respect to its variational parameters {µ, Σ}. [sent-266, score-0.641]
52 Delta method variational inference optimizes this objective in the coordinate update of q(θ) . [sent-286, score-0.896]
53 Note this is more expensive than Laplace variational inference because optimizing Equation 16 requires the third derivative ▽3 f (θ). [sent-288, score-0.748]
54 Braun and McAuliffe (2010) were the first to use the delta method in a variational inference algorithm, developing this technique for the discrete choice model. [sent-289, score-0.923]
55 While Laplace inference required the digamma function and log Γ function, delta method inference will further require the trigamma function. [sent-294, score-0.704]
56 We now turn to the update for the variational distribution of the conjugate variable q(z). [sent-297, score-0.741]
57 Recall that η(θ) maps the nonconjugate variable θ to the natural parameter of the conjugate variable z. [sent-307, score-0.613]
58 ) Using delta method variational inference to update q(θ), the update for q(z) is identical to that in Laplace variational inference. [sent-314, score-1.574]
59 5: Figure 1: Nonconjugate variational inference Setting the partial gradient ∂L (q(z))/∂q(z) = 0 gives the same optimal q(z) of Equation 5. [sent-326, score-0.748]
60 Computing this update reduces to the approach for Laplace variational inference in Equation 17. [sent-327, score-0.814]
61 To implement nonconjugate inference we need this update for q(z) and the definition of f (·) in Equation 14. [sent-333, score-0.752]
62 4 Nonconjugate Variational Inference We now present the full algorithm for nonconjugate variational inference. [sent-335, score-0.976]
63 Recall that the variational distribution of the nonconjugate variable is a Gaussian q(θ | µ, Σ); the variational distribution of the conjugate variable is q(z | φ), where φ is a natural parameter in the same family as p(z | η(θ)). [sent-337, score-1.736]
64 In either Laplace or delta method inference, we have reduced deriving variational updates for complicated nonconjugate models to mechanical work— calculating derivatives and calling a numerical optimization library. [sent-345, score-1.235]
65 We note that Laplace inference is simpler to derive because it only requires second derivatives of the function in Equation 11; 1015 WANG AND B LEI Figure 2: The approximate variational objective from Equation 19 goes up as a function of the iteration. [sent-346, score-0.809]
66 4 Example Models We have described a generic algorithm for approximate posterior inference in nonconjugate models. [sent-362, score-0.788]
67 In this section we derive this algorithm for several nonconjugate models from the research literature: the correlated topic model (Blei and Lafferty, 2007), Bayesian logistic regression (Jaakkola and Jordan, 1997), and hierarchical Bayesian logistic regression (Gelman and Hill, 2007). [sent-363, score-1.123]
68 The nonconjugate variable is θ; the conjugate variable is the collection z = z1:N ; the observation is the collection of words x = x1:N . [sent-369, score-0.663]
69 model, we identify the variables—the nonconjugate variable θ, conjugate variable z, and observations x—and we calculate f (θ) from Equation 11. [sent-370, score-0.613]
70 )The nonconjugate variable is the vector of coefficients θm , the conjugate variable is the collection of observed classes for each data point, zm = zm,1:N . [sent-405, score-0.638]
71 This calculation is important in two contexts: it is used when forming predictions about new data; and it is used as a subroutine in the variational expectation maximization algorithm for fitting the topics and logistic normal parameters (mean µ0 and covariance Σ0 ) with maximum likelihood. [sent-412, score-0.719]
72 In terms of the earlier notation, the nonconjugate variable is the topic proportions θ, the conjugate variable is the collection of topic assignments z = z1:N , and the observation is the collection of words x = x1:N . [sent-417, score-1.047]
73 The variational distribution for the topic proportions θ is Gaussian, q(θ) = N (µ, Σ); the variational distribution for the topic assignments is discrete, q(z) = ∏n q(zn | φn ) where each φn is a distribution over K elements. [sent-418, score-1.479]
74 In delta method inference, as in Braun and McAuliffe (2010), we restrict the variational covariance Σ to be diagonal to simplify the derivative of Equation 16. [sent-419, score-0.694]
75 Besides the CTM, this approach can be adapted to a variety of nonconjugate topic models, including the topic evolution model (Xing, 2005), Dirichlet-multinomial regression (Mimno and McCallum, 2008), dynamic topic models (Blei and Lafferty, 2006; Wang et al. [sent-422, score-1.071]
76 Using Laplace variational inference, our approach recovers the standard Laplace approximation for Bayesian logistic regression (Bishop, 2006). [sent-442, score-0.702]
77 m m As for the CTM, we use nonconjugate inference as a subroutine in a variational EM algorithm (where the M step is regularized). [sent-453, score-1.205]
78 1019 WANG AND B LEI 5 Empirical Study We studied nonconjugate variational inference with correlated topic models and Bayesian logistic regression. [sent-459, score-1.604]
79 We found that nonconjugate inference is more accurate than the existing methods tailored to specific models. [sent-460, score-0.686]
80 Between the two nonconjugate inference algorithms, we found that Laplace inference is faster and more accurate than delta method inference. [sent-461, score-1.09]
81 1 The Correlated Topic Model We studied Laplace inference and delta method inference in the CTM. [sent-463, score-0.633]
82 In the E-step we perform approximate posterior inference with each document, estimating its topic proportions and topic assignments. [sent-474, score-0.715]
83 We fit models with different kinds of Esteps, using both of the nonconjugate inference methods from Section 3 and the original approach of Blei and Lafferty (2007). [sent-476, score-0.736]
84 To initialize nonconjugate inference we set the variational mean parameter µ = 0 for log topic proportions θ and computed the corresponding updates for the topic assignments z. [sent-477, score-1.694]
85 With nonconjugate inference in the E-step, variational EM approximately optimizes a bound on the marginal probability of the observed data. [sent-479, score-1.223]
86 We split each held-out document in to two halves (w1 , w2 ) and form the approximate posterior log topic proportions qw1 (θ) using one of the approximate inference algorithms and the first half of the document w1 . [sent-486, score-0.775]
87 Figure 5 (a) indicates that the approximate bounds from nonconjugate inference generally go up as the number of topics increases. [sent-493, score-0.769]
88 Finally, note that Laplace variational inference was always better than both other algorithms. [sent-501, score-0.748]
89 Given a test-case input t with label z, we compute the log predictive likelihood, log p(z | µ,t) = z1 log σ(µ⊤t) + z2 log σ(−µ⊤t), where µ is the mean of variational distribution q(θ) = N (µ, Σ). [sent-521, score-0.866]
90 1021 WANG AND B LEI Figure 5: Laplace variational inference is “Lap-Var”; delta method variational inference is “DeltaVar”; Blei and Lafferty’s method is “BL. [sent-538, score-1.671]
91 Laplace inference and delta method inference gave slightly better accuracy than Jaakkola and Jordan’s method, and much 1022 VARIATIONAL I NFERENCE IN N ONCONJUGATE M ODELS Figure 6: In this figure, we set the number of topics as K = 60. [sent-550, score-0.683]
92 With predictive likelihood, Laplace variational inference in the hierarchical model is significantly better than all other approaches. [sent-608, score-0.864]
93 6 Discussion We developed Laplace and delta method variational inference, two strategies for variational inference in a large class of nonconjugate models. [sent-609, score-1.899]
94 ) We compared Laplace inference, delta inference and Jaakkola and Jordan’s (1996) method in three settings: separate logistic regression models for each school, a pooled logistic regression model for all schools, and the hierarchical logistic regression model in Section 4. [sent-636, score-1.033]
95 Similar to the main paper, we use mean-field variational inference (Jordan et al. [sent-654, score-0.748]
96 1) to approximate it, although delta method variational inference (Section 3. [sent-663, score-0.956]
97 With this notation, f (θ) = η(θ)⊤ Eq(z) [t(z)] − 1 (θ − µ0 )⊤ Σ−1 (θ − µ0 ), 0 2 where Eq(z) [t(z)] is the expected word counts of each topic under the variational distribution q(z). [sent-682, score-0.739]
98 In delta method variational inference, we also need to compute the gradient of Trace ▽2 f (θ)Σ = − ∑K πk Σkk + πT Σπ ∑K Eq(z) [t(z)] k − Trace(Σ−1 Σ). [sent-687, score-0.694]
99 For delta variational inference, we also need the gradient for Trace ▽2 f (θ)Σ . [sent-708, score-0.694]
100 A generalized mean field algorithm for variational inference in exponential families. [sent-990, score-0.779]
wordName wordTfidf (topN-words)
[('variational', 0.519), ('nonconjugate', 0.457), ('laplace', 0.278), ('eq', 0.256), ('inference', 0.229), ('blei', 0.176), ('delta', 0.175), ('topic', 0.17), ('logistic', 0.129), ('conjugate', 0.118), ('onconjugate', 0.114), ('ctm', 0.105), ('dirichlet', 0.093), ('zd', 0.082), ('jaakkola', 0.081), ('bayesian', 0.076), ('nference', 0.076), ('jordan', 0.073), ('lei', 0.071), ('lafferty', 0.071), ('log', 0.071), ('posterior', 0.069), ('mcauliffe', 0.067), ('update', 0.066), ('document', 0.063), ('equation', 0.062), ('odels', 0.062), ('braun', 0.061), ('ascent', 0.057), ('hierarchical', 0.051), ('models', 0.05), ('topics', 0.05), ('correlated', 0.05), ('bishop', 0.048), ('family', 0.047), ('tn', 0.046), ('taylor', 0.046), ('eld', 0.044), ('proportions', 0.044), ('predictive', 0.044), ('multinomial', 0.041), ('wang', 0.037), ('scene', 0.036), ('conditionally', 0.036), ('coordinate', 0.036), ('exp', 0.035), ('gelman', 0.034), ('updates', 0.034), ('approximate', 0.033), ('regression', 0.033), ('zn', 0.031), ('exponential', 0.031), ('word', 0.031), ('paisley', 0.03), ('hidden', 0.03), ('likelihood', 0.03), ('held', 0.03), ('holding', 0.028), ('objective', 0.028), ('xd', 0.026), ('clinton', 0.026), ('doksum', 0.026), ('schools', 0.026), ('tmn', 0.026), ('minka', 0.026), ('language', 0.025), ('draw', 0.025), ('updating', 0.025), ('collection', 0.025), ('documents', 0.024), ('hyperparameters', 0.023), ('unigram', 0.022), ('tierney', 0.022), ('hill', 0.022), ('yeast', 0.022), ('corpus', 0.022), ('cients', 0.021), ('model', 0.021), ('normal', 0.021), ('em', 0.021), ('approximation', 0.021), ('latent', 0.02), ('bernardo', 0.02), ('ahmed', 0.02), ('trace', 0.02), ('prior', 0.02), ('optimize', 0.02), ('school', 0.019), ('distribution', 0.019), ('conditional', 0.019), ('xing', 0.019), ('graphical', 0.019), ('variable', 0.019), ('chong', 0.019), ('approximations', 0.019), ('bickel', 0.018), ('optimizes', 0.018), ('covariates', 0.018), ('boutell', 0.018), ('corduneanu', 0.018)]
simIndex simValue paperId paperTitle
same-paper 1 1.0000001 121 jmlr-2013-Variational Inference in Nonconjugate Models
Author: Chong Wang, David M. Blei
Abstract: Mean-field variational methods are widely used for approximate posterior inference in many probabilistic models. In a typical application, mean-field methods approximately compute the posterior with a coordinate-ascent optimization algorithm. When the model is conditionally conjugate, the coordinate updates are easily derived and in closed form. However, many models of interest—like the correlated topic model and Bayesian logistic regression—are nonconjugate. In these models, mean-field methods cannot be directly applied and practitioners have had to develop variational algorithms on a case-by-case basis. In this paper, we develop two generic methods for nonconjugate models, Laplace variational inference and delta method variational inference. Our methods have several advantages: they allow for easily derived variational algorithms with a wide class of nonconjugate models; they extend and unify some of the existing algorithms that have been derived for specific models; and they work well on real-world data sets. We studied our methods on the correlated topic model, Bayesian logistic regression, and hierarchical Bayesian logistic regression. Keywords: variational inference, nonconjugate models, Laplace approximations, the multivariate delta method
2 0.60986596 108 jmlr-2013-Stochastic Variational Inference
Author: Matthew D. Hoffman, David M. Blei, Chong Wang, John Paisley
Abstract: We develop stochastic variational inference, a scalable algorithm for approximating posterior distributions. We develop this technique for a large class of probabilistic models and we demonstrate it with two probabilistic topic models, latent Dirichlet allocation and the hierarchical Dirichlet process topic model. Using stochastic variational inference, we analyze several large collections of documents: 300K articles from Nature, 1.8M articles from The New York Times, and 3.8M articles from Wikipedia. Stochastic inference can easily handle data sets of this size and outperforms traditional variational inference, which can only handle a smaller subset. (We also show that the Bayesian nonparametric topic model outperforms its parametric counterpart.) Stochastic variational inference lets us apply complex Bayesian models to massive data sets. Keywords: Bayesian inference, variational inference, stochastic optimization, topic models, Bayesian nonparametrics
3 0.22411659 47 jmlr-2013-Gaussian Kullback-Leibler Approximate Inference
Author: Edward Challis, David Barber
Abstract: We investigate Gaussian Kullback-Leibler (G-KL) variational approximate inference techniques for Bayesian generalised linear models and various extensions. In particular we make the following novel contributions: sufficient conditions for which the G-KL objective is differentiable and convex are described; constrained parameterisations of Gaussian covariance that make G-KL methods fast and scalable are provided; the lower bound to the normalisation constant provided by G-KL methods is proven to dominate those provided by local lower bounding methods; complexity and model applicability issues of G-KL versus other Gaussian approximate inference methods are discussed. Numerical results comparing G-KL and other deterministic Gaussian approximate inference methods are presented for: robust Gaussian process regression models with either Student-t or Laplace likelihoods, large scale Bayesian binary logistic regression models, and Bayesian sparse linear models for sequential experimental design. Keywords: generalised linear models, latent linear models, variational approximate inference, large scale inference, sparse learning, experimental design, active learning, Gaussian processes
4 0.10038967 15 jmlr-2013-Bayesian Canonical Correlation Analysis
Author: Arto Klami, Seppo Virtanen, Samuel Kaski
Abstract: Canonical correlation analysis (CCA) is a classical method for seeking correlations between two multivariate data sets. During the last ten years, it has received more and more attention in the machine learning community in the form of novel computational formulations and a plethora of applications. We review recent developments in Bayesian models and inference methods for CCA which are attractive for their potential in hierarchical extensions and for coping with the combination of large dimensionalities and small sample sizes. The existing methods have not been particularly successful in fulfilling the promise yet; we introduce a novel efficient solution that imposes group-wise sparsity to estimate the posterior of an extended model which not only extracts the statistical dependencies (correlations) between data sets but also decomposes the data into shared and data set-specific components. In statistics literature the model is known as inter-battery factor analysis (IBFA), for which we now provide a Bayesian treatment. Keywords: Bayesian modeling, canonical correlation analysis, group-wise sparsity, inter-battery factor analysis, variational Bayesian approximation
5 0.090806037 120 jmlr-2013-Variational Algorithms for Marginal MAP
Author: Qiang Liu, Alexander Ihler
Abstract: The marginal maximum a posteriori probability (MAP) estimation problem, which calculates the mode of the marginal posterior distribution of a subset of variables with the remaining variables marginalized, is an important inference problem in many models, such as those with hidden variables or uncertain parameters. Unfortunately, marginal MAP can be NP-hard even on trees, and has attracted less attention in the literature compared to the joint MAP (maximization) and marginalization problems. We derive a general dual representation for marginal MAP that naturally integrates the marginalization and maximization operations into a joint variational optimization problem, making it possible to easily extend most or all variational-based algorithms to marginal MAP. In particular, we derive a set of “mixed-product” message passing algorithms for marginal MAP, whose form is a hybrid of max-product, sum-product and a novel “argmax-product” message updates. We also derive a class of convergent algorithms based on proximal point methods, including one that transforms the marginal MAP problem into a sequence of standard marginalization problems. Theoretically, we provide guarantees under which our algorithms give globally or locally optimal solutions, and provide novel upper bounds on the optimal objectives. Empirically, we demonstrate that our algorithms significantly outperform the existing approaches, including a state-of-the-art algorithm based on local search methods. Keywords: graphical models, message passing, belief propagation, variational methods, maximum a posteriori, marginal-MAP, hidden variable models
6 0.082020067 115 jmlr-2013-Training Energy-Based Models for Time-Series Imputation
7 0.068840109 45 jmlr-2013-GPstuff: Bayesian Modeling with Gaussian Processes
8 0.068782978 16 jmlr-2013-Bayesian Nonparametric Hidden Semi-Markov Models
9 0.065796107 48 jmlr-2013-Generalized Spike-and-Slab Priors for Bayesian Group Feature Selection Using Expectation Propagation
10 0.058700081 75 jmlr-2013-Nested Expectation Propagation for Gaussian Process Classification with a Multinomial Probit Likelihood
11 0.056925718 90 jmlr-2013-Quasi-Newton Method: A New Direction
12 0.054804336 49 jmlr-2013-Global Analytic Solution of Fully-observed Variational Bayesian Matrix Factorization
13 0.05027514 88 jmlr-2013-Perturbative Corrections for Approximate Inference in Gaussian Latent Variable Models
14 0.04129687 98 jmlr-2013-Segregating Event Streams and Noise with a Markov Renewal Process Model
15 0.036116801 14 jmlr-2013-Asymptotic Results on Adaptive False Discovery Rate Controlling Procedures Based on Kernel Estimators
16 0.034907095 93 jmlr-2013-Random Walk Kernels and Learning Curves for Gaussian Process Regression on Random Graphs
17 0.033768814 104 jmlr-2013-Sparse Single-Index Model
18 0.03371039 58 jmlr-2013-Language-Motivated Approaches to Action Recognition
19 0.031598292 76 jmlr-2013-Nonparametric Sparsity and Regularization
20 0.029808341 26 jmlr-2013-Conjugate Relation between Loss Functions and Uncertainty Sets in Classification Problems
topicId topicWeight
[(0, -0.248), (1, -0.612), (2, 0.077), (3, -0.065), (4, -0.353), (5, 0.12), (6, 0.292), (7, 0.055), (8, 0.168), (9, -0.078), (10, 0.033), (11, -0.02), (12, -0.047), (13, 0.041), (14, -0.041), (15, 0.013), (16, 0.038), (17, 0.046), (18, -0.002), (19, -0.101), (20, -0.018), (21, 0.013), (22, 0.099), (23, 0.048), (24, 0.026), (25, 0.098), (26, 0.015), (27, 0.035), (28, 0.004), (29, 0.005), (30, -0.055), (31, -0.025), (32, 0.03), (33, 0.002), (34, 0.015), (35, -0.03), (36, -0.039), (37, 0.002), (38, -0.051), (39, 0.061), (40, -0.015), (41, -0.024), (42, -0.024), (43, 0.008), (44, -0.012), (45, 0.011), (46, 0.012), (47, 0.012), (48, 0.004), (49, 0.054)]
simIndex simValue paperId paperTitle
same-paper 1 0.98093319 121 jmlr-2013-Variational Inference in Nonconjugate Models
Author: Chong Wang, David M. Blei
Abstract: Mean-field variational methods are widely used for approximate posterior inference in many probabilistic models. In a typical application, mean-field methods approximately compute the posterior with a coordinate-ascent optimization algorithm. When the model is conditionally conjugate, the coordinate updates are easily derived and in closed form. However, many models of interest—like the correlated topic model and Bayesian logistic regression—are nonconjugate. In these models, mean-field methods cannot be directly applied and practitioners have had to develop variational algorithms on a case-by-case basis. In this paper, we develop two generic methods for nonconjugate models, Laplace variational inference and delta method variational inference. Our methods have several advantages: they allow for easily derived variational algorithms with a wide class of nonconjugate models; they extend and unify some of the existing algorithms that have been derived for specific models; and they work well on real-world data sets. We studied our methods on the correlated topic model, Bayesian logistic regression, and hierarchical Bayesian logistic regression. Keywords: variational inference, nonconjugate models, Laplace approximations, the multivariate delta method
2 0.97255778 108 jmlr-2013-Stochastic Variational Inference
Author: Matthew D. Hoffman, David M. Blei, Chong Wang, John Paisley
Abstract: We develop stochastic variational inference, a scalable algorithm for approximating posterior distributions. We develop this technique for a large class of probabilistic models and we demonstrate it with two probabilistic topic models, latent Dirichlet allocation and the hierarchical Dirichlet process topic model. Using stochastic variational inference, we analyze several large collections of documents: 300K articles from Nature, 1.8M articles from The New York Times, and 3.8M articles from Wikipedia. Stochastic inference can easily handle data sets of this size and outperforms traditional variational inference, which can only handle a smaller subset. (We also show that the Bayesian nonparametric topic model outperforms its parametric counterpart.) Stochastic variational inference lets us apply complex Bayesian models to massive data sets. Keywords: Bayesian inference, variational inference, stochastic optimization, topic models, Bayesian nonparametrics
3 0.60838884 47 jmlr-2013-Gaussian Kullback-Leibler Approximate Inference
Author: Edward Challis, David Barber
Abstract: We investigate Gaussian Kullback-Leibler (G-KL) variational approximate inference techniques for Bayesian generalised linear models and various extensions. In particular we make the following novel contributions: sufficient conditions for which the G-KL objective is differentiable and convex are described; constrained parameterisations of Gaussian covariance that make G-KL methods fast and scalable are provided; the lower bound to the normalisation constant provided by G-KL methods is proven to dominate those provided by local lower bounding methods; complexity and model applicability issues of G-KL versus other Gaussian approximate inference methods are discussed. Numerical results comparing G-KL and other deterministic Gaussian approximate inference methods are presented for: robust Gaussian process regression models with either Student-t or Laplace likelihoods, large scale Bayesian binary logistic regression models, and Bayesian sparse linear models for sequential experimental design. Keywords: generalised linear models, latent linear models, variational approximate inference, large scale inference, sparse learning, experimental design, active learning, Gaussian processes
4 0.49065632 15 jmlr-2013-Bayesian Canonical Correlation Analysis
Author: Arto Klami, Seppo Virtanen, Samuel Kaski
Abstract: Canonical correlation analysis (CCA) is a classical method for seeking correlations between two multivariate data sets. During the last ten years, it has received more and more attention in the machine learning community in the form of novel computational formulations and a plethora of applications. We review recent developments in Bayesian models and inference methods for CCA which are attractive for their potential in hierarchical extensions and for coping with the combination of large dimensionalities and small sample sizes. The existing methods have not been particularly successful in fulfilling the promise yet; we introduce a novel efficient solution that imposes group-wise sparsity to estimate the posterior of an extended model which not only extracts the statistical dependencies (correlations) between data sets but also decomposes the data into shared and data set-specific components. In statistics literature the model is known as inter-battery factor analysis (IBFA), for which we now provide a Bayesian treatment. Keywords: Bayesian modeling, canonical correlation analysis, group-wise sparsity, inter-battery factor analysis, variational Bayesian approximation
5 0.38367918 115 jmlr-2013-Training Energy-Based Models for Time-Series Imputation
Author: Philémon Brakel, Dirk Stroobandt, Benjamin Schrauwen
Abstract: Imputing missing values in high dimensional time-series is a difficult problem. This paper presents a strategy for training energy-based graphical models for imputation directly, bypassing difficulties probabilistic approaches would face. The training strategy is inspired by recent work on optimization-based learning (Domke, 2012) and allows complex neural models with convolutional and recurrent structures to be trained for imputation tasks. In this work, we use this training strategy to derive learning rules for three substantially different neural architectures. Inference in these models is done by either truncated gradient descent or variational mean-field iterations. In our experiments, we found that the training methods outperform the Contrastive Divergence learning algorithm. Moreover, the training methods can easily handle missing values in the training data itself during learning. We demonstrate the performance of this learning scheme and the three models we introduce on one artificial and two real-world data sets. Keywords: neural networks, energy-based models, time-series, missing values, optimization
6 0.27958661 120 jmlr-2013-Variational Algorithms for Marginal MAP
7 0.27692658 16 jmlr-2013-Bayesian Nonparametric Hidden Semi-Markov Models
8 0.25770155 45 jmlr-2013-GPstuff: Bayesian Modeling with Gaussian Processes
9 0.23564146 49 jmlr-2013-Global Analytic Solution of Fully-observed Variational Bayesian Matrix Factorization
10 0.20862661 88 jmlr-2013-Perturbative Corrections for Approximate Inference in Gaussian Latent Variable Models
11 0.1971224 48 jmlr-2013-Generalized Spike-and-Slab Priors for Bayesian Group Feature Selection Using Expectation Propagation
12 0.17835368 98 jmlr-2013-Segregating Event Streams and Noise with a Markov Renewal Process Model
13 0.17289372 75 jmlr-2013-Nested Expectation Propagation for Gaussian Process Classification with a Multinomial Probit Likelihood
14 0.1646152 104 jmlr-2013-Sparse Single-Index Model
15 0.16371292 9 jmlr-2013-A Widely Applicable Bayesian Information Criterion
16 0.16332643 90 jmlr-2013-Quasi-Newton Method: A New Direction
17 0.14684612 94 jmlr-2013-Ranked Bandits in Metric Spaces: Learning Diverse Rankings over Large Document Collections
18 0.1419199 25 jmlr-2013-Communication-Efficient Algorithms for Statistical Optimization
19 0.13682087 76 jmlr-2013-Nonparametric Sparsity and Regularization
20 0.13578618 58 jmlr-2013-Language-Motivated Approaches to Action Recognition
topicId topicWeight
[(0, 0.031), (5, 0.123), (6, 0.037), (10, 0.063), (20, 0.018), (23, 0.035), (53, 0.099), (68, 0.019), (70, 0.022), (75, 0.055), (85, 0.014), (87, 0.01), (93, 0.365)]
simIndex simValue paperId paperTitle
same-paper 1 0.80317879 121 jmlr-2013-Variational Inference in Nonconjugate Models
Author: Chong Wang, David M. Blei
Abstract: Mean-field variational methods are widely used for approximate posterior inference in many probabilistic models. In a typical application, mean-field methods approximately compute the posterior with a coordinate-ascent optimization algorithm. When the model is conditionally conjugate, the coordinate updates are easily derived and in closed form. However, many models of interest—like the correlated topic model and Bayesian logistic regression—are nonconjugate. In these models, mean-field methods cannot be directly applied and practitioners have had to develop variational algorithms on a case-by-case basis. In this paper, we develop two generic methods for nonconjugate models, Laplace variational inference and delta method variational inference. Our methods have several advantages: they allow for easily derived variational algorithms with a wide class of nonconjugate models; they extend and unify some of the existing algorithms that have been derived for specific models; and they work well on real-world data sets. We studied our methods on the correlated topic model, Bayesian logistic regression, and hierarchical Bayesian logistic regression. Keywords: variational inference, nonconjugate models, Laplace approximations, the multivariate delta method
2 0.70249188 76 jmlr-2013-Nonparametric Sparsity and Regularization
Author: Lorenzo Rosasco, Silvia Villa, Sofia Mosci, Matteo Santoro, Alessandro Verri
Abstract: In this work we are interested in the problems of supervised learning and variable selection when the input-output dependence is described by a nonlinear function depending on a few variables. Our goal is to consider a sparse nonparametric model, hence avoiding linear or additive models. The key idea is to measure the importance of each variable in the model by making use of partial derivatives. Based on this intuition we propose a new notion of nonparametric sparsity and a corresponding least squares regularization scheme. Using concepts and results from the theory of reproducing kernel Hilbert spaces and proximal methods, we show that the proposed learning algorithm corresponds to a minimization problem which can be provably solved by an iterative procedure. The consistency properties of the obtained estimator are studied both in terms of prediction and selection performance. An extensive empirical analysis shows that the proposed method performs favorably with respect to the state-of-the-art methods. Keywords: sparsity, nonparametric, variable selection, regularization, proximal methods, RKHS ∗. Also at Istituto Italiano di Tecnologia, Via Morego 30, 16163 Genova, Italy and Massachusetts Institute of Technology, Bldg. 46-5155, 77 Massachusetts Avenue, Cambridge, MA 02139, USA. c 2013 Lorenzo Rosasco, Silvia Villa, Sofia Mosci, Matteo Santoro and Alessandro Verri. ROSASCO , V ILLA , M OSCI , S ANTORO AND V ERRI
3 0.5622282 108 jmlr-2013-Stochastic Variational Inference
Author: Matthew D. Hoffman, David M. Blei, Chong Wang, John Paisley
Abstract: We develop stochastic variational inference, a scalable algorithm for approximating posterior distributions. We develop this technique for a large class of probabilistic models and we demonstrate it with two probabilistic topic models, latent Dirichlet allocation and the hierarchical Dirichlet process topic model. Using stochastic variational inference, we analyze several large collections of documents: 300K articles from Nature, 1.8M articles from The New York Times, and 3.8M articles from Wikipedia. Stochastic inference can easily handle data sets of this size and outperforms traditional variational inference, which can only handle a smaller subset. (We also show that the Bayesian nonparametric topic model outperforms its parametric counterpart.) Stochastic variational inference lets us apply complex Bayesian models to massive data sets. Keywords: Bayesian inference, variational inference, stochastic optimization, topic models, Bayesian nonparametrics
4 0.45048594 69 jmlr-2013-Manifold Regularization and Semi-supervised Learning: Some Theoretical Analyses
Author: Partha Niyogi
Abstract: Manifold regularization (Belkin et al., 2006) is a geometrically motivated framework for machine learning within which several semi-supervised algorithms have been constructed. Here we try to provide some theoretical understanding of this approach. Our main result is to expose the natural structure of a class of problems on which manifold regularization methods are helpful. We show that for such problems, no supervised learner can learn effectively. On the other hand, a manifold based learner (that knows the manifold or “learns” it from unlabeled examples) can learn with relatively few labeled examples. Our analysis follows a minimax style with an emphasis on finite sample results (in terms of n: the number of labeled examples). These results allow us to properly interpret manifold regularization and related spectral and geometric algorithms in terms of their potential use in semi-supervised learning. Keywords: semi-supervised learning, manifold regularization, graph Laplacian, minimax rates
5 0.43786418 47 jmlr-2013-Gaussian Kullback-Leibler Approximate Inference
Author: Edward Challis, David Barber
Abstract: We investigate Gaussian Kullback-Leibler (G-KL) variational approximate inference techniques for Bayesian generalised linear models and various extensions. In particular we make the following novel contributions: sufficient conditions for which the G-KL objective is differentiable and convex are described; constrained parameterisations of Gaussian covariance that make G-KL methods fast and scalable are provided; the lower bound to the normalisation constant provided by G-KL methods is proven to dominate those provided by local lower bounding methods; complexity and model applicability issues of G-KL versus other Gaussian approximate inference methods are discussed. Numerical results comparing G-KL and other deterministic Gaussian approximate inference methods are presented for: robust Gaussian process regression models with either Student-t or Laplace likelihoods, large scale Bayesian binary logistic regression models, and Bayesian sparse linear models for sequential experimental design. Keywords: generalised linear models, latent linear models, variational approximate inference, large scale inference, sparse learning, experimental design, active learning, Gaussian processes
6 0.4221662 46 jmlr-2013-GURLS: A Least Squares Library for Supervised Learning
7 0.40669876 25 jmlr-2013-Communication-Efficient Algorithms for Statistical Optimization
8 0.40563303 48 jmlr-2013-Generalized Spike-and-Slab Priors for Bayesian Group Feature Selection Using Expectation Propagation
9 0.4011603 32 jmlr-2013-Differential Privacy for Functions and Functional Data
10 0.40070924 75 jmlr-2013-Nested Expectation Propagation for Gaussian Process Classification with a Multinomial Probit Likelihood
11 0.39880997 120 jmlr-2013-Variational Algorithms for Marginal MAP
12 0.39418042 72 jmlr-2013-Multi-Stage Multi-Task Feature Learning
13 0.39401147 88 jmlr-2013-Perturbative Corrections for Approximate Inference in Gaussian Latent Variable Models
14 0.39234325 73 jmlr-2013-Multicategory Large-Margin Unified Machines
15 0.38958997 117 jmlr-2013-Universal Consistency of Localized Versions of Regularized Kernel Methods
16 0.38776457 15 jmlr-2013-Bayesian Canonical Correlation Analysis
17 0.38584557 57 jmlr-2013-Kernel Bayes' Rule: Bayesian Inference with Positive Definite Kernels
18 0.38279822 28 jmlr-2013-Construction of Approximation Spaces for Reinforcement Learning
19 0.38044673 14 jmlr-2013-Asymptotic Results on Adaptive False Discovery Rate Controlling Procedures Based on Kernel Estimators
20 0.37826529 4 jmlr-2013-A Max-Norm Constrained Minimization Approach to 1-Bit Matrix Completion