jmlr jmlr2013 jmlr2013-108 knowledge-graph by maker-knowledge-mining

108 jmlr-2013-Stochastic Variational Inference


Source: pdf

Author: Matthew D. Hoffman, David M. Blei, Chong Wang, John Paisley

Abstract: We develop stochastic variational inference, a scalable algorithm for approximating posterior distributions. We develop this technique for a large class of probabilistic models and we demonstrate it with two probabilistic topic models, latent Dirichlet allocation and the hierarchical Dirichlet process topic model. Using stochastic variational inference, we analyze several large collections of documents: 300K articles from Nature, 1.8M articles from The New York Times, and 3.8M articles from Wikipedia. Stochastic inference can easily handle data sets of this size and outperforms traditional variational inference, which can only handle a smaller subset. (We also show that the Bayesian nonparametric topic model outperforms its parametric counterpart.) Stochastic variational inference lets us apply complex Bayesian models to massive data sets. Keywords: Bayesian inference, variational inference, stochastic optimization, topic models, Bayesian nonparametrics

Reference: text


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 ) Stochastic variational inference lets us apply complex Bayesian models to massive data sets. [sent-16, score-0.848]

2 Keywords: Bayesian inference, variational inference, stochastic optimization, topic models, Bayesian nonparametrics 1. [sent-17, score-0.989]

3 These posteriors were approximated using stochastic variational inference with 1. [sent-57, score-0.899]

4 We will derive stochastic variational inference for a large class of graphical models. [sent-67, score-0.933]

5 In particular, we demonstrate stochastic variational inference on latent Dirichlet allocation (Blei et al. [sent-69, score-0.934]

6 (This latter application demonstrates how to use stochastic variational inference in a variety of Bayesian nonparametric settings. [sent-72, score-0.934]

7 Variational inference is amenable to stochastic optimization because the variational objective decomposes into a sum of terms, one for each data point in the analysis. [sent-99, score-0.899]

8 With one more detail—the idea of a natural gradient (Amari, 1998)—stochastic variational inference has an attractive form: 1. [sent-102, score-0.881]

9 These innovations led to automated variational inference, allowing a practitioner to write down a model and immediately use variational inference to estimate its posterior (Bishop et al. [sent-120, score-1.364]

10 In this paper, we develop scalable methods for generic Bayesian inference by solving the variational inference problem with stochastic optimization (Robbins and Monro, 1951). [sent-124, score-1.114]

11 Finally, we note that stochastic optimization was also used with variational inference in Platt et al. [sent-129, score-0.899]

12 In Section 2, we review variational inference for graphical models and then derive stochastic variational inference. [sent-140, score-1.533]

13 In Section 3, we review probabilistic topic models and Bayesian nonparametric models and then derive the stochastic variational inference algorithms in these settings. [sent-141, score-1.316]

14 In Section 4, we study stochastic variational inference on several large text data sets. [sent-142, score-0.899]

15 Stochastic Variational Inference We derive stochastic variational inference, a stochastic optimization algorithm for mean-field variational inference. [sent-147, score-1.422]

16 We review mean-field variational inference, an approximate inference strategy that seeks a tractable distribution over the hidden variables which is close to the posterior distribution. [sent-154, score-0.934]

17 We derive the traditional variational inference algorithm for our class of models, which is a coordinate ascent algorithm. [sent-155, score-0.888]

18 We review the natural gradient and derive the natural gradient of the variational objective function. [sent-157, score-0.826]

19 The natural gradient closely relates to coordinate ascent variational inference. [sent-158, score-0.81]

20 We review stochastic optimization, a technique that uses noisy estimates of a gradient to optimize an objective function, and apply it to variational inference. [sent-160, score-0.86]

21 Specifically, we use stochastic optimization with noisy estimates of the natural gradient of the variational objective. [sent-161, score-0.89]

22 We show how the resulting algorithm, stochastic variational inference, easily builds on traditional variational inference algorithms but can handle much larger data sets. [sent-163, score-1.482]

23 n=1 (6) This form will be important when we derive stochastic variational inference in Section 2. [sent-197, score-0.899]

24 1310 S TOCHASTIC VARIATIONAL I NFERENCE In this section we review mean-field variational inference, the form of variational inference that uses a family where each hidden variable is independent. [sent-222, score-1.446]

25 We describe the variational objective function, discuss the mean-field variational family, and derive the traditional coordinate ascent algorithm for fitting the variational parameters. [sent-223, score-1.82]

26 With the assumptions that we have made about the model and variational distribution—that each conditional is in an exponential family and that the corresponding variational distribution is in the same exponential family—we can optimize each coordinate in closed form. [sent-263, score-1.281]

27 However, while the global update in Equation 15 depends on all the local variational parameters—and note there is a set of local parameters for each of the N observations—the local update in Equation 16 only depends on the global parameters and the other parameters associated with the nth context. [sent-288, score-1.045]

28 The updates in Equations 15 and 16 form the algorithm for coordinate ascent variational inference, iterating between updating each local parameter and the global parameters. [sent-291, score-0.82]

29 In variational inference, we take variational expectations of the natural parameters of the same distributions. [sent-297, score-1.202]

30 We now show how stochastic inference arises by applying stochastic optimization to the natural gradients of the variational objective. [sent-319, score-1.132]

31 We now return to variational inference and compute the natural gradient of the ELBO with respect to the variational parameters. [sent-357, score-1.441]

32 Researchers have used the natural gradient in variational inference for nonlinear state space models (Honkela et al. [sent-358, score-0.921]

33 They are easier to compute because premultiplying by the Fisher information matrix—which we must do to compute the classical gradient in Equation 14 but which disappears from the natural gradient in Equation 22—is prohibitively expensive for variational parameters with many components. [sent-376, score-0.825]

34 In the next section we will see that efficiently computing the natural gradient lets us develop scalable variational inference algorithms. [sent-377, score-0.938]

35 4 Stochastic Variational Inference The coordinate ascent algorithm in Figure 3 is inefficient for large data sets because we must optimize the local variational parameters for each data point before re-estimating the global variational parameters. [sent-379, score-1.38]

36 Stochastic variational inference uses stochastic optimization to fit the global variational parameters. [sent-380, score-1.526]

37 We have reviewed mean-field variational inference in models with exponential family conditionals and showed that the natural gradient of the variational objective function is easy to compute. [sent-382, score-1.628]

38 We now discuss stochastic optimization, which uses a series of noisy estimates of the gradient, and use it with noisy natural gradients to derive stochastic variational inference. [sent-383, score-1.036]

39 In statistical estimation problems, including variational inference of the global parameters, the gradient can be written as a sum of terms (one for each data point) and we can compute a fast noisy approximation by subsampling the data. [sent-387, score-0.992]

40 We use stochastic optimization with noisy natural gradients to optimize the variational objective function. [sent-398, score-0.839]

41 We show that this algorithm is stochastic natural gradient ascent on the global variational parameters. [sent-407, score-0.991]

42 Writing L as a function of the global and local variational parameters, Let the function φ(λ) return a local optimum of the local variational parameters so that ∇φ L (λ, φ(λ)) = 0. [sent-409, score-1.357]

43 Stochastic variational inference optimizes the maximized ELBO L (λ) by subsampling the data to form noisy estimates of the natural gradient. [sent-413, score-0.852]

44 Therefore, the natural gradient of LI with respect to each global variational parameter λ is a noisy but unbiased estimate of the natural gradient of the variational objective. [sent-424, score-1.499]

45 While the full natural gradient would use the local variational parameters for the whole data set, the noisy natural gradient only considers the local parameters for one randomly sampled data point. [sent-434, score-1.024]

46 At each iteration we use the noisy gradient (with step size ρt ) to update the global variational parameter. [sent-440, score-0.814]

47 So far, we have considered stochastic variational inference algorithms where only one observation xt is sampled at a time. [sent-464, score-0.899]

48 Stochastic Variational Inference in Topic Models We derived stochastic variational inference, a scalable inference algorithm that can be applied to a large class of hierarchical Bayesian models. [sent-487, score-0.952]

49 In this section we show how to use the general algorithm of Section 2 to derive stochastic variational inference for two probabilistic topic models: latent Dirichlet allocation (LDA) (Blei et al. [sent-488, score-1.236]

50 We will derive the algorithms in several steps: (1) we specify the model assumptions; (2) we derive the complete conditional distributions of the latent variables; (3) we form the mean-field variational family; (4) we derive the corresponding stochastic inference algorithm. [sent-507, score-0.99]

51 In Section 4, we will report our empirical study of stochastic variational inference with these models. [sent-508, score-0.899]

52 Figure 1 illustrates posterior topics found with stochastic variational inference. [sent-571, score-0.92]

53 ) We compare two inference algorithms for LDA: stochastic inference on the full collection and batch inference on a subset of 100,000 documents. [sent-585, score-0.818]

54 ) We see that stochastic variational inference converges faster and to a better model. [sent-587, score-0.899]

55 (2010a), which is a special case of the stochastic variational inference algorithm we developed in Section 2. [sent-602, score-0.899]

56 We specify the global and local variables of LDA to place it in the stochastic variational inference setting of Section 2. [sent-608, score-1.044]

57 In mean-field variational inference, the variational distributions of each variable are in the same family as the complete conditional. [sent-614, score-1.186]

58 (28) Thus its variational distribution is a multinomial q(zdn ) = Multinomial(φdn ), where the variational parameter φdn is a point on the K − 1-simplex. [sent-617, score-1.166]

59 Per the mean-field approximation, each observed word is endowed with a different variational distribution for its topic assignment, allowing different words to be associated with different topics. [sent-618, score-0.879]

60 With this conditional, the variational distribution of the topic proportions is also Dirichlet q(θd ) = Dirichlet(γd ), where γd is a K-vector Dirichlet parameter. [sent-622, score-0.912]

61 The variational distribution for each topic is a V -dimensional Dirichlet, q(βk ) = Dirichlet(λk ). [sent-631, score-0.838]

62 As we will see in the next section, the traditional variational inference algorithm for LDA is inefficient with large collections of documents. [sent-632, score-0.823]

63 The root of this inefficiency is the update for the topic parameter λk , which (from Equation 30) requires summing over variational parameters for every word in the collection. [sent-633, score-0.946]

64 With the complete conditionals in hand, we now derive the coordinate ascent variational inference algorithm, that is, the batch inference algorithm of Figure 3. [sent-635, score-1.24]

65 The variational parameters are the global per-topic Dirichlet parameters λ1:K , local per-document Dirichlet parameters γ1:D , and local per-word multinomial parameters φ1:D,1:N . [sent-638, score-0.883]

66 Coordinate ascent variational inference iterates between updating all of the local variational parameters (Equation 16) and updating the global variational parameters (Equation 15). [sent-639, score-2.12]

67 We update each document d’s local variational in a local coordinate ascent routine, iterating between updating each word’s topic assignment and the per-document topic proportions, φk ∝ exp {Ψ(γdk ) + Ψ(λk,wdn ) − Ψ (∑v λkv )} dn γd = α + ∑N φdn . [sent-640, score-1.591]

68 dn dn After finding variational parameters for each document, we update the variational Dirichlet for each topic, λk = η + ∑D ∑N φk wdn . [sent-648, score-1.446]

69 Stochastic variational inference provides a scalable method for approximate posterior inference in LDA. [sent-676, score-1.019]

70 The global variational parameters are the topic Dirichlet parameters λk ; the local variational parameters are the per-document topic proportion Dirichlet parameters γd and the per-word topic assignment multinomial parameters φdn . [sent-677, score-2.283]

71 In the global phase we use these fitted local variational parameters to form intermediate topics, ˆ λk = η + D ∑N φk wdn . [sent-683, score-0.827]

72 ) Stochastic variational inference on the full data converges faster and to a better place than batch variational inference on a reasonably sized subset. [sent-692, score-1.576]

73 Figure 6 gives the algorithm for stochastic variational inference for LDA. [sent-694, score-0.899]

74 We derive stochastic variational inference for the Bayesian nonparametric variant of LDA, the hierarchical Dirichlet process (HDP) topic model. [sent-702, score-1.238]

75 More broadly, stochastic variational inference for the HDP topic model demonstrates the possibilities of stochastic inference in the context of Bayesian nonparametric statistics. [sent-707, score-1.551]

76 We then show how to use this construction to form the HDP topic model and how to use stochastic variational inference to approximate the posterior. [sent-722, score-1.177]

77 It is the gateway to variational inference in Bayesian nonparametric models (Blei and Jordan, 2006). [sent-739, score-0.823]

78 At the document level, breaking proportions πd create a set of probabilities (Step 3b) and topic indices cd , drawn from σ(v), attach each document-level stick length to a topic (Step 3a). [sent-825, score-0.841]

79 Following the same procedure as for LDA, we now derive stochastic variational inference for the HDP topic model. [sent-834, score-1.177]

80 These cannot be completely represented in the variational distribution as this would require optimizing an infinite number of variational parameters. [sent-848, score-1.12]

81 With truncation levels set high enough, the variational posterior will use as many topics as the posterior needs, but will not necessarily use all K topics to explain the observations. [sent-853, score-1.003]

82 From the complete conditionals, batch variational inference proceeds by updating each variational parameter using the expectation of its conditional distribution’s natural parameter. [sent-858, score-1.474]

83 i=1 n=1 di dn We then update the global variational parameters by taking a step in the direction of the stochastic natural gradient ˆ λ(t+1) = (1 − ρt )λ(t) + ρt λk . [sent-861, score-1.105]

84 The other global variables in the HDP are the corpus-level breaking proportions vk , each of (1) (2) which is associated with a set of beta parameters ak = (ak , ak ) for its variational distribution. [sent-863, score-0.914]

85 Figure 9 gives the stochastic variational inference algorithm for the HDP topic model. [sent-868, score-1.177]

86 ) i E[Zdn ] = φi dn Relevant expectation Figure 8: A graphical model for the HDP topic model, and a summary of its variational inference algorithm. [sent-870, score-1.142]

87 η + ∑D ∑∞ ck ∑N zi wdn d=1 i=1 di n=1 dn (1 + ∑N zi , α + ∑N ∑∞ n=1 j=i+1 zdn ) n=1 dn Beta πdi log σi (πd ) + ∑∞ ck log βk,wdn k=1 di Conditional Multinomial Type zdn Var S TOCHASTIC VARIATIONAL I NFERENCE H OFFMAN , B LEI , WANG AND PAISLEY 1: 2: 3: 4: 5: Initialize λ(0) randomly. [sent-871, score-0.825]

88 13: until forever Figure 9: Stochastic variational inference for the HDP topic model. [sent-909, score-1.026]

89 ) As for LDA, stochastic variational inference on the full data converges faster and to a better place than batch variational inference on a reasonably sized subset. [sent-917, score-1.727]

90 As with LDA, stochastic variational inference for the HDP converges faster and to a better model. [sent-921, score-0.899]

91 Empirical Study In this section we study the empirical performance and effectiveness of stochastic variational inference for latent Dirichlet allocation (LDA) and the hierarchical Dirichlet process (HDP) topic model. [sent-923, score-1.238]

92 Finally, we compare stochastic variational inference to the traditional batch variational inference algorithm. [sent-926, score-1.75]

93 We then use these parameters with the observed test words wobs to compute the variational distribution of the topic proportions. [sent-963, score-0.909]

94 The differences are that the topic proportions are computed via the two-level variational stick-breaking distribution and K is the truncation level of the approximate posterior. [sent-968, score-0.937]

95 Although stochastic variational inference algorithm converges to a stationary point for any valid κ, τ, and S, the quality of this stationary point and the speed of convergence may depend on how these parameters are set. [sent-973, score-0.928]

96 Traditional variational inference (on subsets of each corpus) did not perform as well as stochastic inference. [sent-1005, score-0.899]

97 Discussion We developed stochastic variational inference, a scalable variational inference algorithm that lets us analyze massive data sets with complex probabilistic models. [sent-1086, score-1.57]

98 The main idea is to use stochastic optimization to optimize the variational objective, following noisy estimates of the natural gradient where the noise arises by repeatedly subsampling the data. [sent-1087, score-0.918]

99 With stochastic variational inference, we can easily apply topic modeling to collections of millions of documents. [sent-1126, score-1.041]

100 , 2012b developed a stochastic variational inference algorithm for a specific nonconjugate Bayesian nonparametric model. [sent-1144, score-0.934]


similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('variational', 0.56), ('topic', 0.278), ('dirichlet', 0.215), ('eq', 0.209), ('hdp', 0.206), ('inference', 0.188), ('paisley', 0.173), ('topics', 0.153), ('stochastic', 0.151), ('lda', 0.148), ('elbo', 0.143), ('blei', 0.126), ('zdn', 0.125), ('document', 0.12), ('zn', 0.113), ('offman', 0.112), ('gradient', 0.103), ('tochastic', 0.101), ('hidden', 0.099), ('wdn', 0.095), ('nference', 0.087), ('forgetting', 0.086), ('dn', 0.082), ('lei', 0.082), ('ascent', 0.08), ('conditionals', 0.08), ('batch', 0.08), ('proportions', 0.074), ('documents', 0.072), ('ag', 0.068), ('global', 0.067), ('bayesian', 0.06), ('posterior', 0.056), ('log', 0.053), ('collections', 0.052), ('gradients', 0.052), ('jordan', 0.051), ('breaking', 0.048), ('cdi', 0.048), ('nth', 0.047), ('local', 0.047), ('multinomial', 0.046), ('noisy', 0.046), ('eld', 0.046), ('wang', 0.046), ('kv', 0.046), ('beta', 0.045), ('corpus', 0.045), ('di', 0.045), ('teh', 0.044), ('stick', 0.043), ('bnp', 0.042), ('wobs', 0.042), ('word', 0.041), ('models', 0.04), ('family', 0.039), ('update', 0.038), ('coordinate', 0.037), ('predictive', 0.036), ('wiki', 0.036), ('latent', 0.035), ('zi', 0.035), ('nonparametric', 0.035), ('graphical', 0.034), ('atoms', 0.034), ('equation', 0.033), ('variables', 0.031), ('nyt', 0.03), ('ak', 0.03), ('massive', 0.03), ('lets', 0.03), ('vocabulary', 0.03), ('natural', 0.03), ('parameters', 0.029), ('conditional', 0.029), ('intermediate', 0.029), ('dp', 0.029), ('updates', 0.029), ('exponential', 0.028), ('draw', 0.028), ('subsampling', 0.028), ('articles', 0.028), ('complete', 0.027), ('sports', 0.027), ('fox', 0.027), ('scalable', 0.027), ('hierarchical', 0.026), ('hoffman', 0.026), ('amari', 0.025), ('ck', 0.025), ('truncation', 0.025), ('xn', 0.025), ('assignment', 0.024), ('sym', 0.024), ('wikipedia', 0.024), ('probabilistic', 0.024), ('expectations', 0.023), ('collection', 0.023), ('traditional', 0.023), ('schedule', 0.023)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 1.0000004 108 jmlr-2013-Stochastic Variational Inference

Author: Matthew D. Hoffman, David M. Blei, Chong Wang, John Paisley

Abstract: We develop stochastic variational inference, a scalable algorithm for approximating posterior distributions. We develop this technique for a large class of probabilistic models and we demonstrate it with two probabilistic topic models, latent Dirichlet allocation and the hierarchical Dirichlet process topic model. Using stochastic variational inference, we analyze several large collections of documents: 300K articles from Nature, 1.8M articles from The New York Times, and 3.8M articles from Wikipedia. Stochastic inference can easily handle data sets of this size and outperforms traditional variational inference, which can only handle a smaller subset. (We also show that the Bayesian nonparametric topic model outperforms its parametric counterpart.) Stochastic variational inference lets us apply complex Bayesian models to massive data sets. Keywords: Bayesian inference, variational inference, stochastic optimization, topic models, Bayesian nonparametrics

2 0.60986596 121 jmlr-2013-Variational Inference in Nonconjugate Models

Author: Chong Wang, David M. Blei

Abstract: Mean-field variational methods are widely used for approximate posterior inference in many probabilistic models. In a typical application, mean-field methods approximately compute the posterior with a coordinate-ascent optimization algorithm. When the model is conditionally conjugate, the coordinate updates are easily derived and in closed form. However, many models of interest—like the correlated topic model and Bayesian logistic regression—are nonconjugate. In these models, mean-field methods cannot be directly applied and practitioners have had to develop variational algorithms on a case-by-case basis. In this paper, we develop two generic methods for nonconjugate models, Laplace variational inference and delta method variational inference. Our methods have several advantages: they allow for easily derived variational algorithms with a wide class of nonconjugate models; they extend and unify some of the existing algorithms that have been derived for specific models; and they work well on real-world data sets. We studied our methods on the correlated topic model, Bayesian logistic regression, and hierarchical Bayesian logistic regression. Keywords: variational inference, nonconjugate models, Laplace approximations, the multivariate delta method

3 0.16681068 47 jmlr-2013-Gaussian Kullback-Leibler Approximate Inference

Author: Edward Challis, David Barber

Abstract: We investigate Gaussian Kullback-Leibler (G-KL) variational approximate inference techniques for Bayesian generalised linear models and various extensions. In particular we make the following novel contributions: sufficient conditions for which the G-KL objective is differentiable and convex are described; constrained parameterisations of Gaussian covariance that make G-KL methods fast and scalable are provided; the lower bound to the normalisation constant provided by G-KL methods is proven to dominate those provided by local lower bounding methods; complexity and model applicability issues of G-KL versus other Gaussian approximate inference methods are discussed. Numerical results comparing G-KL and other deterministic Gaussian approximate inference methods are presented for: robust Gaussian process regression models with either Student-t or Laplace likelihoods, large scale Bayesian binary logistic regression models, and Bayesian sparse linear models for sequential experimental design. Keywords: generalised linear models, latent linear models, variational approximate inference, large scale inference, sparse learning, experimental design, active learning, Gaussian processes

4 0.10293677 115 jmlr-2013-Training Energy-Based Models for Time-Series Imputation

Author: Philémon Brakel, Dirk Stroobandt, Benjamin Schrauwen

Abstract: Imputing missing values in high dimensional time-series is a difficult problem. This paper presents a strategy for training energy-based graphical models for imputation directly, bypassing difficulties probabilistic approaches would face. The training strategy is inspired by recent work on optimization-based learning (Domke, 2012) and allows complex neural models with convolutional and recurrent structures to be trained for imputation tasks. In this work, we use this training strategy to derive learning rules for three substantially different neural architectures. Inference in these models is done by either truncated gradient descent or variational mean-field iterations. In our experiments, we found that the training methods outperform the Contrastive Divergence learning algorithm. Moreover, the training methods can easily handle missing values in the training data itself during learning. We demonstrate the performance of this learning scheme and the three models we introduce on one artificial and two real-world data sets. Keywords: neural networks, energy-based models, time-series, missing values, optimization

5 0.10151307 15 jmlr-2013-Bayesian Canonical Correlation Analysis

Author: Arto Klami, Seppo Virtanen, Samuel Kaski

Abstract: Canonical correlation analysis (CCA) is a classical method for seeking correlations between two multivariate data sets. During the last ten years, it has received more and more attention in the machine learning community in the form of novel computational formulations and a plethora of applications. We review recent developments in Bayesian models and inference methods for CCA which are attractive for their potential in hierarchical extensions and for coping with the combination of large dimensionalities and small sample sizes. The existing methods have not been particularly successful in fulfilling the promise yet; we introduce a novel efficient solution that imposes group-wise sparsity to estimate the posterior of an extended model which not only extracts the statistical dependencies (correlations) between data sets but also decomposes the data into shared and data set-specific components. In statistics literature the model is known as inter-battery factor analysis (IBFA), for which we now provide a Bayesian treatment. Keywords: Bayesian modeling, canonical correlation analysis, group-wise sparsity, inter-battery factor analysis, variational Bayesian approximation

6 0.10081006 16 jmlr-2013-Bayesian Nonparametric Hidden Semi-Markov Models

7 0.096031733 120 jmlr-2013-Variational Algorithms for Marginal MAP

8 0.057653204 49 jmlr-2013-Global Analytic Solution of Fully-observed Variational Bayesian Matrix Factorization

9 0.056237455 90 jmlr-2013-Quasi-Newton Method: A New Direction

10 0.05454221 94 jmlr-2013-Ranked Bandits in Metric Spaces: Learning Diverse Rankings over Large Document Collections

11 0.053398523 58 jmlr-2013-Language-Motivated Approaches to Action Recognition

12 0.04857482 48 jmlr-2013-Generalized Spike-and-Slab Priors for Bayesian Group Feature Selection Using Expectation Propagation

13 0.047769319 88 jmlr-2013-Perturbative Corrections for Approximate Inference in Gaussian Latent Variable Models

14 0.047159623 75 jmlr-2013-Nested Expectation Propagation for Gaussian Process Classification with a Multinomial Probit Likelihood

15 0.046679679 45 jmlr-2013-GPstuff: Bayesian Modeling with Gaussian Processes

16 0.044228137 17 jmlr-2013-Belief Propagation for Continuous State Spaces: Stochastic Message-Passing with Quantitative Guarantees

17 0.037920285 104 jmlr-2013-Sparse Single-Index Model

18 0.037832417 98 jmlr-2013-Segregating Event Streams and Noise with a Markov Renewal Process Model

19 0.036544312 43 jmlr-2013-Fast MCMC Sampling for Markov Jump Processes and Extensions

20 0.035801154 86 jmlr-2013-Parallel Vector Field Embedding


similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, -0.264), (1, -0.58), (2, 0.053), (3, -0.06), (4, -0.372), (5, 0.137), (6, 0.306), (7, 0.077), (8, 0.179), (9, -0.069), (10, 0.025), (11, -0.03), (12, -0.035), (13, 0.049), (14, -0.02), (15, 0.007), (16, 0.041), (17, 0.05), (18, 0.017), (19, -0.085), (20, -0.023), (21, 0.047), (22, 0.063), (23, 0.048), (24, 0.034), (25, 0.115), (26, 0.012), (27, 0.064), (28, -0.007), (29, 0.011), (30, -0.053), (31, -0.028), (32, 0.021), (33, 0.001), (34, 0.039), (35, -0.036), (36, -0.031), (37, 0.01), (38, -0.041), (39, 0.026), (40, -0.004), (41, -0.011), (42, -0.026), (43, -0.015), (44, 0.004), (45, 0.001), (46, 0.003), (47, 0.016), (48, 0.012), (49, 0.027)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.98378724 108 jmlr-2013-Stochastic Variational Inference

Author: Matthew D. Hoffman, David M. Blei, Chong Wang, John Paisley

Abstract: We develop stochastic variational inference, a scalable algorithm for approximating posterior distributions. We develop this technique for a large class of probabilistic models and we demonstrate it with two probabilistic topic models, latent Dirichlet allocation and the hierarchical Dirichlet process topic model. Using stochastic variational inference, we analyze several large collections of documents: 300K articles from Nature, 1.8M articles from The New York Times, and 3.8M articles from Wikipedia. Stochastic inference can easily handle data sets of this size and outperforms traditional variational inference, which can only handle a smaller subset. (We also show that the Bayesian nonparametric topic model outperforms its parametric counterpart.) Stochastic variational inference lets us apply complex Bayesian models to massive data sets. Keywords: Bayesian inference, variational inference, stochastic optimization, topic models, Bayesian nonparametrics

2 0.97581106 121 jmlr-2013-Variational Inference in Nonconjugate Models

Author: Chong Wang, David M. Blei

Abstract: Mean-field variational methods are widely used for approximate posterior inference in many probabilistic models. In a typical application, mean-field methods approximately compute the posterior with a coordinate-ascent optimization algorithm. When the model is conditionally conjugate, the coordinate updates are easily derived and in closed form. However, many models of interest—like the correlated topic model and Bayesian logistic regression—are nonconjugate. In these models, mean-field methods cannot be directly applied and practitioners have had to develop variational algorithms on a case-by-case basis. In this paper, we develop two generic methods for nonconjugate models, Laplace variational inference and delta method variational inference. Our methods have several advantages: they allow for easily derived variational algorithms with a wide class of nonconjugate models; they extend and unify some of the existing algorithms that have been derived for specific models; and they work well on real-world data sets. We studied our methods on the correlated topic model, Bayesian logistic regression, and hierarchical Bayesian logistic regression. Keywords: variational inference, nonconjugate models, Laplace approximations, the multivariate delta method

3 0.56718665 47 jmlr-2013-Gaussian Kullback-Leibler Approximate Inference

Author: Edward Challis, David Barber

Abstract: We investigate Gaussian Kullback-Leibler (G-KL) variational approximate inference techniques for Bayesian generalised linear models and various extensions. In particular we make the following novel contributions: sufficient conditions for which the G-KL objective is differentiable and convex are described; constrained parameterisations of Gaussian covariance that make G-KL methods fast and scalable are provided; the lower bound to the normalisation constant provided by G-KL methods is proven to dominate those provided by local lower bounding methods; complexity and model applicability issues of G-KL versus other Gaussian approximate inference methods are discussed. Numerical results comparing G-KL and other deterministic Gaussian approximate inference methods are presented for: robust Gaussian process regression models with either Student-t or Laplace likelihoods, large scale Bayesian binary logistic regression models, and Bayesian sparse linear models for sequential experimental design. Keywords: generalised linear models, latent linear models, variational approximate inference, large scale inference, sparse learning, experimental design, active learning, Gaussian processes

4 0.50630677 15 jmlr-2013-Bayesian Canonical Correlation Analysis

Author: Arto Klami, Seppo Virtanen, Samuel Kaski

Abstract: Canonical correlation analysis (CCA) is a classical method for seeking correlations between two multivariate data sets. During the last ten years, it has received more and more attention in the machine learning community in the form of novel computational formulations and a plethora of applications. We review recent developments in Bayesian models and inference methods for CCA which are attractive for their potential in hierarchical extensions and for coping with the combination of large dimensionalities and small sample sizes. The existing methods have not been particularly successful in fulfilling the promise yet; we introduce a novel efficient solution that imposes group-wise sparsity to estimate the posterior of an extended model which not only extracts the statistical dependencies (correlations) between data sets but also decomposes the data into shared and data set-specific components. In statistics literature the model is known as inter-battery factor analysis (IBFA), for which we now provide a Bayesian treatment. Keywords: Bayesian modeling, canonical correlation analysis, group-wise sparsity, inter-battery factor analysis, variational Bayesian approximation

5 0.42684612 115 jmlr-2013-Training Energy-Based Models for Time-Series Imputation

Author: Philémon Brakel, Dirk Stroobandt, Benjamin Schrauwen

Abstract: Imputing missing values in high dimensional time-series is a difficult problem. This paper presents a strategy for training energy-based graphical models for imputation directly, bypassing difficulties probabilistic approaches would face. The training strategy is inspired by recent work on optimization-based learning (Domke, 2012) and allows complex neural models with convolutional and recurrent structures to be trained for imputation tasks. In this work, we use this training strategy to derive learning rules for three substantially different neural architectures. Inference in these models is done by either truncated gradient descent or variational mean-field iterations. In our experiments, we found that the training methods outperform the Contrastive Divergence learning algorithm. Moreover, the training methods can easily handle missing values in the training data itself during learning. We demonstrate the performance of this learning scheme and the three models we introduce on one artificial and two real-world data sets. Keywords: neural networks, energy-based models, time-series, missing values, optimization

6 0.32397756 16 jmlr-2013-Bayesian Nonparametric Hidden Semi-Markov Models

7 0.3024793 120 jmlr-2013-Variational Algorithms for Marginal MAP

8 0.21557833 45 jmlr-2013-GPstuff: Bayesian Modeling with Gaussian Processes

9 0.19924921 88 jmlr-2013-Perturbative Corrections for Approximate Inference in Gaussian Latent Variable Models

10 0.19613998 49 jmlr-2013-Global Analytic Solution of Fully-observed Variational Bayesian Matrix Factorization

11 0.18781671 48 jmlr-2013-Generalized Spike-and-Slab Priors for Bayesian Group Feature Selection Using Expectation Propagation

12 0.18482754 98 jmlr-2013-Segregating Event Streams and Noise with a Markov Renewal Process Model

13 0.18179329 104 jmlr-2013-Sparse Single-Index Model

14 0.18056455 94 jmlr-2013-Ranked Bandits in Metric Spaces: Learning Diverse Rankings over Large Document Collections

15 0.17695355 9 jmlr-2013-A Widely Applicable Bayesian Information Criterion

16 0.17005098 58 jmlr-2013-Language-Motivated Approaches to Action Recognition

17 0.16790317 25 jmlr-2013-Communication-Efficient Algorithms for Statistical Optimization

18 0.15672985 90 jmlr-2013-Quasi-Newton Method: A New Direction

19 0.14565872 76 jmlr-2013-Nonparametric Sparsity and Regularization

20 0.1427636 30 jmlr-2013-Counterfactual Reasoning and Learning Systems: The Example of Computational Advertising


similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(0, 0.036), (5, 0.121), (6, 0.046), (10, 0.089), (20, 0.017), (23, 0.029), (44, 0.011), (53, 0.382), (68, 0.019), (70, 0.025), (75, 0.072), (87, 0.011), (93, 0.052)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.79156488 108 jmlr-2013-Stochastic Variational Inference

Author: Matthew D. Hoffman, David M. Blei, Chong Wang, John Paisley

Abstract: We develop stochastic variational inference, a scalable algorithm for approximating posterior distributions. We develop this technique for a large class of probabilistic models and we demonstrate it with two probabilistic topic models, latent Dirichlet allocation and the hierarchical Dirichlet process topic model. Using stochastic variational inference, we analyze several large collections of documents: 300K articles from Nature, 1.8M articles from The New York Times, and 3.8M articles from Wikipedia. Stochastic inference can easily handle data sets of this size and outperforms traditional variational inference, which can only handle a smaller subset. (We also show that the Bayesian nonparametric topic model outperforms its parametric counterpart.) Stochastic variational inference lets us apply complex Bayesian models to massive data sets. Keywords: Bayesian inference, variational inference, stochastic optimization, topic models, Bayesian nonparametrics

2 0.72913146 69 jmlr-2013-Manifold Regularization and Semi-supervised Learning: Some Theoretical Analyses

Author: Partha Niyogi

Abstract: Manifold regularization (Belkin et al., 2006) is a geometrically motivated framework for machine learning within which several semi-supervised algorithms have been constructed. Here we try to provide some theoretical understanding of this approach. Our main result is to expose the natural structure of a class of problems on which manifold regularization methods are helpful. We show that for such problems, no supervised learner can learn effectively. On the other hand, a manifold based learner (that knows the manifold or “learns” it from unlabeled examples) can learn with relatively few labeled examples. Our analysis follows a minimax style with an emphasis on finite sample results (in terms of n: the number of labeled examples). These results allow us to properly interpret manifold regularization and related spectral and geometric algorithms in terms of their potential use in semi-supervised learning. Keywords: semi-supervised learning, manifold regularization, graph Laplacian, minimax rates

3 0.53650153 121 jmlr-2013-Variational Inference in Nonconjugate Models

Author: Chong Wang, David M. Blei

Abstract: Mean-field variational methods are widely used for approximate posterior inference in many probabilistic models. In a typical application, mean-field methods approximately compute the posterior with a coordinate-ascent optimization algorithm. When the model is conditionally conjugate, the coordinate updates are easily derived and in closed form. However, many models of interest—like the correlated topic model and Bayesian logistic regression—are nonconjugate. In these models, mean-field methods cannot be directly applied and practitioners have had to develop variational algorithms on a case-by-case basis. In this paper, we develop two generic methods for nonconjugate models, Laplace variational inference and delta method variational inference. Our methods have several advantages: they allow for easily derived variational algorithms with a wide class of nonconjugate models; they extend and unify some of the existing algorithms that have been derived for specific models; and they work well on real-world data sets. We studied our methods on the correlated topic model, Bayesian logistic regression, and hierarchical Bayesian logistic regression. Keywords: variational inference, nonconjugate models, Laplace approximations, the multivariate delta method

4 0.42995045 86 jmlr-2013-Parallel Vector Field Embedding

Author: Binbin Lin, Xiaofei He, Chiyuan Zhang, Ming Ji

Abstract: We propose a novel local isometry based dimensionality reduction method from the perspective of vector fields, which is called parallel vector field embedding (PFE). We first give a discussion on local isometry and global isometry to show the intrinsic connection between parallel vector fields and isometry. The problem of finding an isometry turns out to be equivalent to finding orthonormal parallel vector fields on the data manifold. Therefore, we first find orthonormal parallel vector fields by solving a variational problem on the manifold. Then each embedding function can be obtained by requiring its gradient field to be as close to the corresponding parallel vector field as possible. Theoretical results show that our method can precisely recover the manifold if it is isometric to a connected open subset of Euclidean space. Both synthetic and real data examples demonstrate the effectiveness of our method even if there is heavy noise and high curvature. Keywords: manifold learning, isometry, vector field, covariant derivative, out-of-sample extension

5 0.42637601 117 jmlr-2013-Universal Consistency of Localized Versions of Regularized Kernel Methods

Author: Robert Hable

Abstract: In supervised learning problems, global and local learning algorithms are used. In contrast to global learning algorithms, the prediction of a local learning algorithm in a testing point is only based on training data which are close to the testing point. Every global algorithm such as support vector machines (SVM) can be localized in the following way: in every testing point, the (global) learning algorithm is not applied to the whole training data but only to the k nearest neighbors (kNN) of the testing point. In case of support vector machines, the success of such mixtures of SVM and kNN (called SVM-KNN) has been shown in extensive simulation studies and also for real data sets but only little has been known on theoretical properties so far. In the present article, it is shown how a large class of regularized kernel methods (including SVM) can be localized in order to get a universally consistent learning algorithm. Keywords: machine learning, regularized kernel methods, localization, SVM, k-nearest neighbors, SVM-KNN

6 0.42198819 28 jmlr-2013-Construction of Approximation Spaces for Reinforcement Learning

7 0.41698179 76 jmlr-2013-Nonparametric Sparsity and Regularization

8 0.41172025 93 jmlr-2013-Random Walk Kernels and Learning Curves for Gaussian Process Regression on Random Graphs

9 0.40810072 32 jmlr-2013-Differential Privacy for Functions and Functional Data

10 0.40238208 77 jmlr-2013-On the Convergence of Maximum Variance Unfolding

11 0.40204027 4 jmlr-2013-A Max-Norm Constrained Minimization Approach to 1-Bit Matrix Completion

12 0.40144899 33 jmlr-2013-Dimension Independent Similarity Computation

13 0.40100914 75 jmlr-2013-Nested Expectation Propagation for Gaussian Process Classification with a Multinomial Probit Likelihood

14 0.40053946 47 jmlr-2013-Gaussian Kullback-Leibler Approximate Inference

15 0.39923507 3 jmlr-2013-A Framework for Evaluating Approximation Methods for Gaussian Process Regression

16 0.3976213 48 jmlr-2013-Generalized Spike-and-Slab Priors for Bayesian Group Feature Selection Using Expectation Propagation

17 0.39667532 52 jmlr-2013-How to Solve Classification and Regression Problems on High-Dimensional Data with a Supervised Extension of Slow Feature Analysis

18 0.3935757 120 jmlr-2013-Variational Algorithms for Marginal MAP

19 0.39320344 94 jmlr-2013-Ranked Bandits in Metric Spaces: Learning Diverse Rankings over Large Document Collections

20 0.39289662 25 jmlr-2013-Communication-Efficient Algorithms for Statistical Optimization