nips nips2013 nips2013-187 knowledge-graph by maker-knowledge-mining

187 nips-2013-Memoized Online Variational Inference for Dirichlet Process Mixture Models

Source: pdf

Author: Michael Hughes, Erik Sudderth

Abstract: Variational inference algorithms provide the most effective framework for largescale training of Bayesian nonparametric models. Stochastic online approaches are promising, but are sensitive to the chosen learning rate and often converge to poor local optima. We present a new algorithm, memoized online variational inference, which scales to very large (yet ﬁnite) datasets while avoiding the complexities of stochastic gradient. Our algorithm maintains ﬁnite-dimensional sufﬁcient statistics from batches of the full dataset, requiring some additional memory but still scaling to millions of examples. Exploiting nested families of variational bounds for inﬁnite nonparametric models, we develop principled birth and merge moves allowing non-local optimization. Births adaptively add components to the model to escape local optima, while merges remove redundancy and improve speed. Using Dirichlet process mixture models for image clustering and denoising, we demonstrate major improvements in robustness and accuracy.

Reference: text

Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 We present a new algorithm, memoized online variational inference, which scales to very large (yet ﬁnite) datasets while avoiding the complexities of stochastic gradient. [sent-9, score-0.806]

2 Exploiting nested families of variational bounds for inﬁnite nonparametric models, we develop principled birth and merge moves allowing non-local optimization. [sent-11, score-0.672]

3 Births adaptively add components to the model to escape local optima, while merges remove redundancy and improve speed. [sent-12, score-0.359]

4 Unfortunately, contemporary inference algorithms do not live up to this promise, scaling poorly and yielding solutions that represent poor local optima of the true posterior. [sent-16, score-0.235]

5 Stochastic online variational inference is a promising general-purpose approach to Bayesian nonparametric learning from streaming data [1]. [sent-19, score-0.37]

6 While individual steps of stochastic optimization algorithms are by design scalable, they are extremely vulnerable to local optima for non-convex unsupervised learning problems, frequently yielding poor solutions (see Fig. [sent-20, score-0.2]

7 Recent work has proposed methods for automatically adapting learning rates [2], but these algorithms’ progress on the overall variational objective remains local and non-monotonic. [sent-24, score-0.225]

8 In this paper, we present an alternative algorithm, memoized online variational inference, which avoids noisy gradient steps and learning rates altogether. [sent-25, score-0.759]

9 The algorithm visits each batch in turn and updates a cached set of sufﬁcient statistics which accurately reﬂect the entire dataset. [sent-27, score-0.264]

10 Our memoized approach is generally applicable in any case batch or stochastic online methods are useful, including topic models [1] and relational models [3], though we do not explore these here. [sent-29, score-0.792]

11 1 We further develop a principled framework for escaping local optima in the online setting, by integrating birth and merge moves within our algorithm’s coordinate ascent steps. [sent-30, score-0.693]

12 Our birth and merge moves, together with a nested variational approximation to the posterior, enable adaptive creation and pruning of clusters on-the-ﬂy. [sent-32, score-0.605]

13 Because these moves are validated by an exactly tracked global variational objective, we avoid potential instabilities of stochastic online split-merge proposals [4]. [sent-33, score-0.516]

14 The structure of our moves is very different from split-merge MCMC methods [5, 6]; applications of these algorithms have been limited to hundreds of data points, while our experiments show scaling of memoized split-merge proposals to millions of examples. [sent-34, score-0.669]

15 We review the Dirichlet process mixture model and variational inference in Sec. [sent-35, score-0.33]

16 2 Variational inference for Dirichlet process mixture models The Dirichlet process (DP) provides a nonparametric prior for partitioning exchangeable datasets into discrete clusters [7]. [sent-39, score-0.228]

17 Component k has mixture weight wk sampled as follows: ∞ G ∼ DP(α0 H), G k−1 wk δφk , vk ∼ Beta(1, α0 ), wk = vk k=1 (1 − vℓ ). [sent-41, score-0.392]

18 Each data item n chooses an assignment zn ∼ Cat(w), and then draws observations xn ∼ F (φzn ). [sent-43, score-0.178]

19 The goal of inference is to recover stick-breaking proportions vk and data-generating parameters φk for each global mixture component k, as well as discrete cluster assignments z = {zn }N for each observation. [sent-47, score-0.426]

20 ˆ To ease notation, we mark variables with hats to distinguish parameters θ of variational factors q from ˆk always have equal dimension. [sent-58, score-0.197]

21 In this way, θk and θ 1 2 Crucially, our truncation is nested: any learned q with truncation K can be represented exactly under truncation K +1 by setting the ﬁnal component to have zero mass. [sent-60, score-0.255]

22 For each component k, ˆ we store its expected mass Nk and expected sufﬁcient statistic sk (x). [sent-65, score-0.198]

23 We next review variational algorithms for optimizing q via coordinate ascent, iteratively updating individual factors of q. [sent-68, score-0.197]

24 We describe algorithms in terms of two updates [1]: global parameters (stick-breaking proportions vk and datagenerating parameters φk ), and local parameters (assignments of data to components zn ). [sent-69, score-0.472]

25 2 Full-dataset variational inference Standard full-dataset variational inference [7] updates local factors q(zn | rn ) for all observations ˆ n = 1, . [sent-71, score-0.628]

26 , N by visiting each item n and computing the fraction rnk explained by component k: ˆ rnk = exp Eq [log wk (v)] + Eq [log p(xn | φk )] , ˜ rnk = ˆ rnk ˜ . [sent-74, score-1.173]

27 After computing α ˆ ˆk , sk (x) given the new rnk via Eq. [sent-76, score-0.41]

28 (7), the update equations become summary statistics N ˆ K ˆ αk1 = α1 + Nk , ˆ ˆ Nℓ , αk0 = α0 + ˆ ˆ λk = λ0 + sk (x). [sent-77, score-0.186]

29 3 Stochastic online variational inference Stochastic online (SO) variational inference scales to huge datasets [1]. [sent-81, score-0.654]

30 For example, to update the variational paramˆ eter λk from (5) at iteration t, we ﬁrst compute the global update given only data in the current batch, 3 ˆ ampliﬁed to be at full-dataset scale: λ∗ = λ0 + k N |Bt | sk (Bt ). [sent-89, score-0.393]

31 This online approach has clear computational advantages and can sometimes yield higher quality solutions than the full-data algorithm, since it conveys information between local and global parameters more frequently. [sent-92, score-0.225]

32 3 Memoized online variational inference Generalizing previous incremental variants of the expectation maximization (EM) algorithm [10], we now develop our memoized online variational inference algorithm. [sent-94, score-1.168]

33 For each batch, we maintain memoized sufﬁcient statistics Sk = [Nk (Bb ), sk (Bb )] b=1 0 ˆk , sk (x)]. [sent-96, score-0.82]

34 We also track the full-dataset statistics Sk = [N summary statistics allow guarantees of correct full-dataset analysis while processing only one small batch at a time. [sent-98, score-0.19]

35 Our approach hinges on the fact that these sufﬁcient statistics are additive: summaries of an entire dataset can be written exactly as the addition of summaries of distinct batches. [sent-99, score-0.25]

36 Note that our memoization of deterministic analyses of batches of data is distinct from the stochastic memoization, or “lazy” instantiation, of random variables in some Monte Carlo methods [11, 12]. [sent-100, score-0.216]

37 Memoized inference proceeds by visiting (in random order) each distinct batch once in a full pass through the data, incrementally updating the local and global parameters related to that batch b. [sent-101, score-0.684]

38 First, we update local parameters for the current batch (q(zn | rn ) for n ∈ Bb ) via Eq. [sent-102, score-0.211]

39 Next, we ˆ update cached global sufﬁcient statistics for each component: we subtract the old (cached) summary of batch b, compute a new batch-level summary, and add the result to the full-dataset summary: 0 0 b Sk ← Sk − Sk , b Sk ← rnk , ˆ n∈Bb rnk t(xn ) , ˆ 0 0 b Sk ← Sk + Sk . [sent-104, score-0.889]

40 Unlike stochastic online algorithms, memoized inference is guaranteed to improve the full-dataset ELBO at every step. [sent-107, score-0.717]

41 In the limit where B = 1, memoized inference reduces to standard full-dataset updates. [sent-110, score-0.596]

42 However, given many batches it is far more scalable, while maintaining all guarantees of batch inference. [sent-111, score-0.292]

43 Provided we can store memoized sufﬁcient statistics for each batch (not each observation), memoized inference has the same computational complexity as stochastic methods while avoiding noise and sensitivity to learning rates. [sent-113, score-1.346]

44 Recent analysis of convex optimization algorithms [13] demonstrated theoretical and practical advantages for methods that use cached fulldataset summaries to update parameters, as we do, instead of stochastic current-batch-only updates. [sent-114, score-0.265]

45 This memoized algorithm can compute the full-dataset objective L(q) exactly at any point (after visiting all items once). [sent-115, score-0.571]

46 To do so efﬁciently, we need to compute and store the assignment entropy b Hk = − n∈Bb rnk log rnk after visiting each batch b. [sent-116, score-0.762]

47 1 Birth moves to escape local optima We now propose additional birth moves that, when interleaved with conventional coordinate ascent parameter updates, can add useful new components to the model and escape local optima. [sent-121, score-0.833]

48 Previous methods [14, 9, 4] for changing variational truncations create just one extra component via a “split” move that is highly-specialized to particular likelihoods. [sent-122, score-0.216]

49 In contrast, our births add many components at once and apply to any exponential family mixture model. [sent-124, score-0.331]

50 1 6 1 7 ˆb 1) Create new Memoized summary Nk components expected count of each component 2) Adopt in one pass thru data 1 2 3 4 5 6 7 Batch b current position Batch b+1 Learn fresh DP-GMM on subsample via VB Subsample data explained by 1 0 0 0 0 0 0 0 0 0 0 ! [sent-126, score-0.456]

51 Left: Scatter plot of 2D observed data, and a subsample targeted via the ﬁrst mixture component. [sent-128, score-0.178]

52 ˆb Right: Bar plots of memoized counts Nk for each batch. [sent-130, score-0.514]

53 k Creating new components in the online setting is challenging. [sent-132, score-0.183]

54 Each small batch may not have enough examples of a missing component to inspire a good proposal, even if that component is wellsupported by the full dataset. [sent-133, score-0.278]

55 We thus advocate birth moves that happen in three phases (collection, creation, and adoption) over two passes of the data. [sent-134, score-0.338]

56 1, creates new components and then updates every batch with the expanded model. [sent-137, score-0.334]

57 Successive births are interleaved; at each pass there are proposals both active and in preparation. [sent-138, score-0.242]

58 Collection During pass 1, we collect a targeted subsample x′ of the data, of size at most N ′ = 104 . [sent-141, score-0.178]

59 Creation Before pass 2, we create new components by ﬁtting a DP mixture model with K ′ (we take K ′ = 10) components to x′ , running variational inference for a limited budget of iterations. [sent-147, score-0.625]

60 Adoption During pass 2, we visit each batch and perform local and global parameter updates for the expanded (K + K ′ )-component mixture. [sent-154, score-0.425]

61 These updates use expanded global summaries S 0 that include summaries S ′ from the targeted analysis of x′ . [sent-155, score-0.433]

62 By adding many components at once, our birth move allows rapid escape from poor local optima. [sent-160, score-0.436]

63 However, in practice merge moves reliably reject poor births and restore original conﬁgurations. [sent-162, score-0.444]

64 4, births are so effective that runs started at K = 1 recover necessary components on-the-ﬂy. [sent-164, score-0.265]

65 2 Merge moves that optimize the full data objective The computational cost of inference grows with the number of components K. [sent-166, score-0.332]

66 To keep K small, we develop merge moves that replace two components with a single merged one. [sent-167, score-0.459]

67 Merge moves were ﬁrst explored for batch variational methods [16, 14]. [sent-168, score-0.438]

68 passes thru data (N=100000) Data: 5x5 patches worst MO-BM 1. [sent-175, score-0.291]

69 passes thru data (N=100000) worst MO worst Full best SOb 0. [sent-181, score-0.228]

70 00 Figure 2: Comparison of full-data, stochastic (SO), and memoized (MO) on toy data with K = 8 true components (Sec. [sent-213, score-0.67]

71 variational inference methods have been augmented to evaluate merge proposals based on noisy, single-batch estimates of the ELBO [4]. [sent-221, score-0.479]

72 In contrast, our algorithm accurately computes the full ELBO for each merge proposal, ensuring only useful merges are accepted. [sent-225, score-0.312]

73 Given two components ka , kb to merge, we form a candidate q ′ with K − 1 components, where ˆ ˆ merged component km takes over all assignments to ka , kb : rnkm = rnka + rnkb . [sent-226, score-0.515]

74 Our merge move has three steps: select components, form the candidate conﬁguration q ′ , and accept q ′ if the ELBO improves. [sent-230, score-0.208]

75 Selecting ka , kb to merge at random is unlikely to yield an improved conﬁguration. [sent-231, score-0.29]

76 After choosing ka at random, we select kb using a ratio of marginal likelihoods M which compares the merged and separated conﬁgurations, easily computed with cached summaries: M (Ska + Skb ) , M (Sk ) = exp a0 (λ0 + sk (x)) . [sent-232, score-0.386]

77 (12) M (Ska )M (Skb ) Our memoized approach allows exact evaluation of the full-data ELBO to compare the existing q to merge candidate q ′ . [sent-233, score-0.695]

78 (8), evaluating L(q ′ ) is a linear function of merged sufﬁcient N ˆ ˆ r statistics, except for the assignment entropy term: Hab = − n=1 (ˆnka + rnkb ) log(ˆnka + rnkb ). [sent-235, score-0.195]

79 r We compute this term in advance for all possible merge pairs. [sent-236, score-0.181]

80 This modest precomputation allows rapid and exact merge moves, which improve model quality and speed-up post-merge iterations. [sent-238, score-0.213]

81 p(kb | ka ) ∝ In one pass of the data, our algorithm performs a birth, memoized ascent steps for all batches, and several merges after the ﬁnal batch. [sent-239, score-0.772]

82 4 Experimental results We now compare algorithms for learning DP-Gaussian mixture models (DP-GMM), using our own implementations of full-dataset, stochastic online (SO), and memoized online (MO) inference, as well as our new birth-merge memoized algorithm (MO-BM). [sent-241, score-1.3]

83 1 SOa SOb SOc Full 20 batches 100 batches MO MO−BM Kuri −3 −3. [sent-259, score-0.27]

84 components K Full 20 batches 100 batches MO MO−BM Kuri 100 80 60 40 Full MO MO−BM 0 0 40 80 120 160 200 num. [sent-270, score-0.379]

85 The ELBO trace plots show this method escaping local optima, with slight drops indicating addition of new components followed by rapid increases as these are adopted. [sent-281, score-0.24]

86 They further suggest that our ﬁxed-truncation memoized method competes favorably with full-data inference, often converging to similar or better solutions after fewer passes through the data. [sent-282, score-0.605]

87 2 shows trace plots of GreedyMerge, a memoized online variant that instead uses only the current-batch ELBO to assess a proposed merge, as done in [4]. [sent-285, score-0.588]

88 All 5 runs of this GreedyMerge algorithm ruinously accept merges that decrease the full objective, consistently collapsing down to just one component. [sent-288, score-0.194]

89 Our memoized approach ensures merges are always globally beneﬁcial. [sent-289, score-0.614]

90 Here, we also compare to Kurihara’s public implementation of variational inference with split moves [9]. [sent-293, score-0.363]

91 In contrast, our memoized approach consistently delivers solutions on par with full inference, with no apparent sensitivity to the number of batches. [sent-301, score-0.577]

92 With births and merges enabled, MO-BM expands from K = 1 to over 80 components, ﬁnding better solutions than every smart K = 100 initialization. [sent-302, score-0.266]

93 passes thru data (N=1880200) 300 MO−BM K=1 250 MO K=100 200 SOa K=100 150 100 50 0 0 10 20 30 40 50 60 70 80 90 100 num. [sent-326, score-0.228]

94 passes thru data (N=1880200) log evidence x109 4. [sent-327, score-0.263]

95 passes thru data (N=8640000) Figure 5: 8 × 8 image patches. [sent-339, score-0.256]

96 5 shows that our birth-merge memoized algorithm started at K = 1 can consistently add useful components and reach better solutions than alternatives. [sent-362, score-0.648]

97 5 Conclusions Our novel memoized online variational algorithm avoids noisiness and sensitivity inherent in stochastic methods. [sent-371, score-0.838]

98 Our birth and merge moves successfully escape local optima. [sent-372, score-0.553]

99 Truly nonparametric online variational inference for hierarchical Dirichlet processes. [sent-408, score-0.37]

100 Bayesian model search for mixture models based on optimizing variational bounds. [sent-472, score-0.248]

similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('memoized', 0.514), ('elbo', 0.275), ('mo', 0.262), ('rnk', 0.257), ('merge', 0.181), ('variational', 0.171), ('batch', 0.157), ('sk', 0.153), ('thru', 0.137), ('birth', 0.137), ('batches', 0.135), ('summaries', 0.125), ('births', 0.12), ('soa', 0.12), ('moves', 0.11), ('components', 0.109), ('zn', 0.105), ('merges', 0.1), ('bm', 0.095), ('vk', 0.093), ('passes', 0.091), ('sob', 0.086), ('inference', 0.082), ('eq', 0.078), ('mixture', 0.077), ('pass', 0.077), ('online', 0.074), ('escape', 0.071), ('dp', 0.071), ('truncation', 0.07), ('global', 0.069), ('dirichlet', 0.067), ('optima', 0.066), ('cached', 0.065), ('kurihara', 0.065), ('bb', 0.063), ('patches', 0.063), ('nk', 0.062), ('creation', 0.06), ('merged', 0.059), ('visiting', 0.057), ('subsample', 0.055), ('ka', 0.055), ('local', 0.054), ('kb', 0.054), ('greedymerge', 0.051), ('rnkb', 0.051), ('ska', 0.051), ('skb', 0.051), ('stochastic', 0.047), ('targeted', 0.046), ('smart', 0.046), ('escaping', 0.045), ('proposals', 0.045), ('component', 0.045), ('nonparametric', 0.043), ('wk', 0.043), ('updates', 0.042), ('initialization', 0.042), ('xn', 0.039), ('adoption', 0.037), ('runs', 0.036), ('evidence', 0.035), ('assignment', 0.034), ('kuri', 0.034), ('memoization', 0.034), ('nka', 0.034), ('summary', 0.033), ('poor', 0.033), ('images', 0.033), ('assignments', 0.033), ('hk', 0.032), ('soc', 0.032), ('rapid', 0.032), ('sensitivity', 0.032), ('full', 0.031), ('alignment', 0.031), ('nested', 0.03), ('beta', 0.029), ('blei', 0.029), ('hughes', 0.028), ('zoran', 0.028), ('image', 0.028), ('aligned', 0.028), ('advantages', 0.028), ('bt', 0.027), ('cluster', 0.027), ('tiny', 0.027), ('accept', 0.027), ('clusters', 0.026), ('znk', 0.026), ('subtract', 0.026), ('ascent', 0.026), ('factors', 0.026), ('expanded', 0.026), ('cat', 0.026), ('categories', 0.025), ('patch', 0.025), ('add', 0.025), ('psnr', 0.025)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.99999988 187 nips-2013-Memoized Online Variational Inference for Dirichlet Process Mixture Models

Author: Michael Hughes, Erik Sudderth

2 0.19369496 229 nips-2013-Online Learning of Nonparametric Mixture Models via Sequential Variational Approximation

Author: Dahua Lin

Abstract: Reliance on computationally expensive algorithms for inference has been limiting the use of Bayesian nonparametric models in large scale applications. To tackle this problem, we propose a Bayesian learning algorithm for DP mixture models. Instead of following the conventional paradigm – random initialization plus iterative update, we take an progressive approach. Starting with a given prior, our method recursively transforms it into an approximate posterior through sequential variational approximation. In this process, new components will be incorporated on the ﬂy when needed. The algorithm can reliably estimate a DP mixture model in one pass, making it particularly suited for applications with massive data. Experiments on both synthetic data and real datasets demonstrate remarkable improvement on efﬁciency – orders of magnitude speed-up compared to the state-of-the-art. 1

3 0.13927177 243 nips-2013-Parallel Sampling of DP Mixture Models using Sub-Cluster Splits

Author: Jason Chang, John W. Fisher III

Abstract: We present an MCMC sampler for Dirichlet process mixture models that can be parallelized to achieve signiﬁcant computational gains. We combine a nonergodic, restricted Gibbs iteration with split/merge proposals in a manner that produces an ergodic Markov chain. Each cluster is augmented with two subclusters to construct likely split moves. Unlike some previous parallel samplers, the proposed sampler enforces the correct stationary distribution of the Markov chain without the need for ﬁnite approximations. Empirical results illustrate that the new sampler exhibits better convergence properties than current methods. 1

4 0.12225272 104 nips-2013-Efficient Online Inference for Bayesian Nonparametric Relational Models

Author: Dae Il Kim, Prem Gopalan, David Blei, Erik Sudderth

Abstract: Stochastic block models characterize observed network relationships via latent community memberships. In large social networks, we expect entities to participate in multiple communities, and the number of communities to grow with the network size. We introduce a new model for these phenomena, the hierarchical Dirichlet process relational model, which allows nodes to have mixed membership in an unbounded set of communities. To allow scalable learning, we derive an online stochastic variational inference algorithm. Focusing on assortative models of undirected networks, we also propose an efﬁcient structured mean ﬁeld variational bound, and online methods for automatically pruning unused communities. Compared to state-of-the-art online learning methods for parametric relational models, we show signiﬁcantly improved perplexity and link prediction accuracy for sparse networks with tens of thousands of nodes. We also showcase an analysis of LittleSis, a large network of who-knows-who at the heights of business and government. 1

5 0.099060684 49 nips-2013-Bayesian Inference and Online Experimental Design for Mapping Neural Microcircuits

Author: Ben Shababo, Brooks Paige, Ari Pakman, Liam Paninski

Abstract: With the advent of modern stimulation techniques in neuroscience, the opportunity arises to map neuron to neuron connectivity. In this work, we develop a method for efﬁciently inferring posterior distributions over synaptic strengths in neural microcircuits. The input to our algorithm is data from experiments in which action potentials from putative presynaptic neurons can be evoked while a subthreshold recording is made from a single postsynaptic neuron. We present a realistic statistical model which accounts for the main sources of variability in this experiment and allows for signiﬁcant prior information about the connectivity and neuronal cell types to be incorporated if available. Due to the technical challenges and sparsity of these systems, it is important to focus experimental time stimulating the neurons whose synaptic strength is most ambiguous, therefore we also develop an online optimal design algorithm for choosing which neurons to stimulate at each trial. 1

6 0.094718978 196 nips-2013-Modeling Overlapping Communities with Node Popularities

7 0.091531008 143 nips-2013-Integrated Non-Factorized Variational Inference

8 0.084725797 188 nips-2013-Memory Limited, Streaming PCA

9 0.081556901 13 nips-2013-A Scalable Approach to Probabilistic Latent Space Inference of Large-Scale Networks

10 0.071130201 312 nips-2013-Stochastic Gradient Riemannian Langevin Dynamics on the Probability Simplex

11 0.069622457 100 nips-2013-Dynamic Clustering via Asymptotics of the Dependent Dirichlet Process Mixture

12 0.06949351 18 nips-2013-A simple example of Dirichlet process mixture inconsistency for the number of components

13 0.06746687 23 nips-2013-Active Learning for Probabilistic Hypotheses Using the Maximum Gibbs Error Criterion

14 0.067387074 317 nips-2013-Streaming Variational Bayes

15 0.067270331 301 nips-2013-Sparse Additive Text Models with Low Rank Background

16 0.066814676 351 nips-2013-What Are the Invariant Occlusive Components of Image Patches? A Probabilistic Generative Approach

17 0.065678954 190 nips-2013-Mid-level Visual Element Discovery as Discriminative Mode Seeking

18 0.064359523 345 nips-2013-Variance Reduction for Stochastic Gradient Optimization

19 0.063983656 232 nips-2013-Online PCA for Contaminated Data

20 0.062424652 262 nips-2013-Real-Time Inference for a Gamma Process Model of Neural Spiking

similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.166), (1, 0.054), (2, -0.044), (3, -0.016), (4, 0.018), (5, 0.107), (6, 0.08), (7, 0.047), (8, 0.128), (9, -0.034), (10, -0.045), (11, 0.144), (12, 0.027), (13, 0.081), (14, -0.025), (15, -0.065), (16, 0.017), (17, -0.094), (18, 0.006), (19, -0.072), (20, 0.042), (21, -0.017), (22, -0.062), (23, -0.036), (24, -0.065), (25, 0.016), (26, 0.001), (27, -0.033), (28, 0.12), (29, -0.029), (30, 0.028), (31, 0.09), (32, 0.037), (33, -0.03), (34, -0.107), (35, 0.001), (36, -0.021), (37, 0.08), (38, -0.024), (39, 0.08), (40, -0.064), (41, -0.105), (42, 0.065), (43, 0.086), (44, 0.068), (45, -0.026), (46, -0.049), (47, 0.044), (48, 0.035), (49, -0.042)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.94888276 187 nips-2013-Memoized Online Variational Inference for Dirichlet Process Mixture Models

Author: Michael Hughes, Erik Sudderth

2 0.81763351 229 nips-2013-Online Learning of Nonparametric Mixture Models via Sequential Variational Approximation

Author: Dahua Lin

3 0.69714326 104 nips-2013-Efficient Online Inference for Bayesian Nonparametric Relational Models

Author: Dae Il Kim, Prem Gopalan, David Blei, Erik Sudderth

4 0.64497042 317 nips-2013-Streaming Variational Bayes

Author: Tamara Broderick, Nicholas Boyd, Andre Wibisono, Ashia C. Wilson, Michael Jordan

Abstract: We present SDA-Bayes, a framework for (S)treaming, (D)istributed, (A)synchronous computation of a Bayesian posterior. The framework makes streaming updates to the estimated posterior according to a user-speciﬁed approximation batch primitive. We demonstrate the usefulness of our framework, with variational Bayes (VB) as the primitive, by ﬁtting the latent Dirichlet allocation model to two large-scale document collections. We demonstrate the advantages of our algorithm over stochastic variational inference (SVI) by comparing the two after a single pass through a known amount of data—a case where SVI may be applied—and in the streaming setting, where SVI does not apply. 1

5 0.61371726 13 nips-2013-A Scalable Approach to Probabilistic Latent Space Inference of Large-Scale Networks

Author: Junming Yin, Qirong Ho, Eric Xing

Abstract: We propose a scalable approach for making inference about latent spaces of large networks. With a succinct representation of networks as a bag of triangular motifs, a parsimonious statistical model, and an efﬁcient stochastic variational inference algorithm, we are able to analyze real networks with over a million vertices and hundreds of latent roles on a single machine in a matter of hours, a setting that is out of reach for many existing methods. When compared to the state-of-the-art probabilistic approaches, our method is several orders of magnitude faster, with competitive or improved accuracy for latent space recovery and link prediction. 1

6 0.58991635 312 nips-2013-Stochastic Gradient Riemannian Langevin Dynamics on the Probability Simplex

7 0.58407211 243 nips-2013-Parallel Sampling of DP Mixture Models using Sub-Cluster Splits

8 0.55489099 345 nips-2013-Variance Reduction for Stochastic Gradient Optimization

9 0.53295201 196 nips-2013-Modeling Overlapping Communities with Node Popularities

10 0.52964234 234 nips-2013-Online Variational Approximations to non-Exponential Family Change Point Models: With Application to Radar Tracking

11 0.52500749 238 nips-2013-Optimistic Concurrency Control for Distributed Unsupervised Learning

12 0.52015638 18 nips-2013-A simple example of Dirichlet process mixture inconsistency for the number of components

13 0.51478559 86 nips-2013-Demixing odors - fast inference in olfaction

14 0.51275688 49 nips-2013-Bayesian Inference and Online Experimental Design for Mapping Neural Microcircuits

15 0.50427508 100 nips-2013-Dynamic Clustering via Asymptotics of the Dependent Dirichlet Process Mixture

16 0.50005722 320 nips-2013-Summary Statistics for Partitionings and Feature Allocations

17 0.49797794 167 nips-2013-Learning the Local Statistics of Optical Flow

18 0.49304739 115 nips-2013-Factorized Asymptotic Bayesian Inference for Latent Feature Models

19 0.49000514 351 nips-2013-What Are the Invariant Occlusive Components of Image Patches? A Probabilistic Generative Approach

20 0.48590448 153 nips-2013-Learning Feature Selection Dependencies in Multi-task Learning

similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(16, 0.094), (33, 0.126), (34, 0.13), (36, 0.03), (41, 0.038), (49, 0.025), (50, 0.247), (56, 0.08), (70, 0.049), (85, 0.033), (89, 0.031), (93, 0.022), (95, 0.017)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.80478835 187 nips-2013-Memoized Online Variational Inference for Dirichlet Process Mixture Models

Author: Michael Hughes, Erik Sudderth

2 0.75457752 209 nips-2013-New Subsampling Algorithms for Fast Least Squares Regression

Author: Paramveer Dhillon, Yichao Lu, Dean P. Foster, Lyle Ungar

Abstract: We address the problem of fast estimation of ordinary least squares (OLS) from large amounts of data (n p). We propose three methods which solve the big data problem by subsampling the covariance matrix using either a single or two stage estimation. All three run in the order of size of input i.e. O(np) and our best method, Uluru, gives an error bound of O( p/n) which is independent of the amount of subsampling as long as it is above a threshold. We provide theoretical bounds for our algorithms in the ﬁxed design (with Randomized Hadamard preconditioning) as well as sub-Gaussian random design setting. We also compare the performance of our methods on synthetic and real-world datasets and show that if observations are i.i.d., sub-Gaussian then one can directly subsample without the expensive Randomized Hadamard preconditioning without loss of accuracy. 1

3 0.74379909 101 nips-2013-EDML for Learning Parameters in Directed and Undirected Graphical Models

Author: Khaled Refaat, Arthur Choi, Adnan Darwiche

Abstract: EDML is a recently proposed algorithm for learning parameters in Bayesian networks. It was originally derived in terms of approximate inference on a metanetwork, which underlies the Bayesian approach to parameter estimation. While this initial derivation helped discover EDML in the ﬁrst place and provided a concrete context for identifying some of its properties (e.g., in contrast to EM), the formal setting was somewhat tedious in the number of concepts it drew on. In this paper, we propose a greatly simpliﬁed perspective on EDML, which casts it as a general approach to continuous optimization. The new perspective has several advantages. First, it makes immediate some results that were non-trivial to prove initially. Second, it facilitates the design of EDML algorithms for new graphical models, leading to a new algorithm for learning parameters in Markov networks. We derive this algorithm in this paper, and show, empirically, that it can sometimes learn estimates more efﬁciently from complete data, compared to commonly used optimization methods, such as conjugate gradient and L-BFGS. 1

4 0.66085702 204 nips-2013-Multiscale Dictionary Learning for Estimating Conditional Distributions

Author: Francesca Petralia, Joshua T. Vogelstein, David Dunson

Abstract: Nonparametric estimation of the conditional distribution of a response given highdimensional features is a challenging problem. It is important to allow not only the mean but also the variance and shape of the response density to change ﬂexibly with features, which are massive-dimensional. We propose a multiscale dictionary learning model, which expresses the conditional response density as a convex combination of dictionary densities, with the densities used and their weights dependent on the path through a tree decomposition of the feature space. A fast graph partitioning algorithm is applied to obtain the tree decomposition, with Bayesian methods then used to adaptively prune and average over different sub-trees in a soft probabilistic manner. The algorithm scales efﬁciently to approximately one million features. State of the art predictive performance is demonstrated for toy examples and two neuroscience applications including up to a million features. 1

5 0.65540636 99 nips-2013-Dropout Training as Adaptive Regularization

Author: Stefan Wager, Sida Wang, Percy Liang

Abstract: Dropout and other feature noising schemes control overﬁtting by artiﬁcially corrupting the training data. For generalized linear models, dropout performs a form of adaptive regularization. Using this viewpoint, we show that the dropout regularizer is ﬁrst-order equivalent to an L2 regularizer applied after scaling the features by an estimate of the inverse diagonal Fisher information matrix. We also establish a connection to AdaGrad, an online learning algorithm, and ﬁnd that a close relative of AdaGrad operates by repeatedly solving linear dropout-regularized problems. By casting dropout as regularization, we develop a natural semi-supervised algorithm that uses unlabeled data to create a better adaptive regularizer. We apply this idea to document classiﬁcation tasks, and show that it consistently boosts the performance of dropout training, improving on state-of-the-art results on the IMDB reviews dataset. 1

6 0.65092421 77 nips-2013-Correlations strike back (again): the case of associative memory retrieval

7 0.6464963 229 nips-2013-Online Learning of Nonparametric Mixture Models via Sequential Variational Approximation

8 0.64603031 238 nips-2013-Optimistic Concurrency Control for Distributed Unsupervised Learning

9 0.64346761 34 nips-2013-Analyzing Hogwild Parallel Gaussian Gibbs Sampling

10 0.64341366 86 nips-2013-Demixing odors - fast inference in olfaction

11 0.64219153 58 nips-2013-Binary to Bushy: Bayesian Hierarchical Clustering with the Beta Coalescent

12 0.64186436 350 nips-2013-Wavelets on Graphs via Deep Learning

13 0.63726234 100 nips-2013-Dynamic Clustering via Asymptotics of the Dependent Dirichlet Process Mixture

14 0.63699174 49 nips-2013-Bayesian Inference and Online Experimental Design for Mapping Neural Microcircuits

15 0.63635337 97 nips-2013-Distributed Submodular Maximization: Identifying Representative Elements in Massive Data

16 0.63582867 262 nips-2013-Real-Time Inference for a Gamma Process Model of Neural Spiking

17 0.63579357 201 nips-2013-Multi-Task Bayesian Optimization

18 0.63523024 317 nips-2013-Streaming Variational Bayes

19 0.6347211 173 nips-2013-Least Informative Dimensions

20 0.63424057 243 nips-2013-Parallel Sampling of DP Mixture Models using Sub-Cluster Splits