nips nips2010 nips2010-194 knowledge-graph by maker-knowledge-mining

194 nips-2010-Online Learning for Latent Dirichlet Allocation


Source: pdf

Author: Matthew Hoffman, Francis R. Bach, David M. Blei

Abstract: We develop an online variational Bayes (VB) algorithm for Latent Dirichlet Allocation (LDA). Online LDA is based on online stochastic optimization with a natural gradient step, which we show converges to a local optimum of the VB objective function. It can handily analyze massive document collections, including those arriving in a stream. We study the performance of online LDA in several ways, including by fitting a 100-topic topic model to 3.3M articles from Wikipedia in a single pass. We demonstrate that online LDA finds topic models as good or better than those found with batch VB, and in a fraction of the time. 1

Reference: text


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 fr Abstract We develop an online variational Bayes (VB) algorithm for Latent Dirichlet Allocation (LDA). [sent-9, score-0.489]

2 Online LDA is based on online stochastic optimization with a natural gradient step, which we show converges to a local optimum of the VB objective function. [sent-10, score-0.413]

3 It can handily analyze massive document collections, including those arriving in a stream. [sent-11, score-0.215]

4 We study the performance of online LDA in several ways, including by fitting a 100-topic topic model to 3. [sent-12, score-0.474]

5 We demonstrate that online LDA finds topic models as good or better than those found with batch VB, and in a fraction of the time. [sent-14, score-0.718]

6 For example, research in probabilistic topic modeling—the application we will focus on in this paper—revolves around fitting complex hierarchical Bayesian models to large collections of documents. [sent-17, score-0.259]

7 In a topic model, the posterior distribution reveals latent semantic structure that can be used for many applications. [sent-18, score-0.343]

8 For topic models and many other Bayesian models of interest, however, the posterior is intractable to compute and researchers must appeal to approximate posterior inference. [sent-19, score-0.372]

9 Optimization approaches are usually based on variational inference, which is called variational Bayes (VB) when used in a Bayesian hierarchical model. [sent-22, score-0.41]

10 The per-iteration cost of batch algorithms can quickly become impractical for very large datasets. [sent-27, score-0.244]

11 5 105 Documents seen (log scale) 8192 systems systems service road health systems made communication health service service companies announced billion market national language communication west care company language road billion 12288 16384 105. [sent-31, score-0.611]

12 3 million unique Wikipedia articles is compared with online VB run on 98,000 Wikipedia articles and with the batch algorithm run on the same 98,000 articles. [sent-37, score-0.797]

13 The online algorithms converge much faster than the batch algorithm does. [sent-38, score-0.528]

14 Bottom: Evolution of a topic about business as online LDA sees more and more documents. [sent-39, score-0.539]

15 to summarize the latent structure of massive document collections that cannot be annotated by hand. [sent-40, score-0.235]

16 A central research problem for topic modeling is to efficiently fit models to larger corpora [4, 5]. [sent-41, score-0.279]

17 To this end, we develop an online variational Bayes algorithm for latent Dirichlet allocation (LDA), one of the simplest topic models and one on which many others are based. [sent-42, score-0.801]

18 Our algorithm is based on online stochastic optimization, which has been shown to produce good parameter estimates dramatically faster than batch algorithms on large datasets [6]. [sent-43, score-0.575]

19 Online LDA handily analyzes massive collections of documents and, moreover, online LDA need not locally store or collect the documents— each can arrive in a stream and be discarded after one look. [sent-44, score-0.576]

20 In the subsequent sections, we derive online LDA and show that it converges to a stationary point of the variational objective function. [sent-45, score-0.576]

21 We study the performance of online LDA in several ways, including by fitting a topic model to 3. [sent-46, score-0.474]

22 We show that online LDA finds topic models as good as or better than those found with batch VB, and in a fraction of the time (see figure 1). [sent-48, score-0.718]

23 Online variational Bayes is a practical new method for estimating the posterior of complex hierarchical Bayesian models. [sent-49, score-0.284]

24 2 Online variational Bayes for latent Dirichlet allocation Latent Dirichlet Allocation (LDA) [7] is a Bayesian probabilistic model of text documents. [sent-50, score-0.303]

25 ” Each topic defines a multinomial distribution over the vocabulary and is assumed to have been drawn from a Dirichlet, βk ∼ Dirichlet(η). [sent-52, score-0.281]

26 Then, for each word i in the document, draw a topic index zdi ∈ {1, . [sent-55, score-0.377]

27 , K} from the topic weights zdi ∼ θd and draw the observed word wdi from the selected topic, wdi ∼ βzdi . [sent-58, score-0.475]

28 Note that if we sum over the topic assignments z, then we get p(wdi |θd , β) = k θdk βkw . [sent-60, score-0.264]

29 This leads to the “multinomial PCA” interpretation of LDA; we can think of LDA as a probabilistic factorization of the matrix of word counts n (where ndw is the number of times word w appears in document d) into a matrix of topic weights θ and a dictionary of topics β [9]. [sent-61, score-0.667]

30 Our work can thus 2 be seen as an extension of online matrix factorization techniques that optimize squared error [10] to more general probabilistic formulations. [sent-62, score-0.285]

31 We can analyze a corpus of documents with LDA by examining the posterior distribution of the topics β, topic proportions θ, and topic assignments z conditioned on the documents. [sent-63, score-0.92]

32 This posterior cannot be computed directly [7], and is usually approximated using Markov Chain Monte Carlo (MCMC) methods or variational inference. [sent-65, score-0.284]

33 Developing scalable approximate inference methods for topic models is an active area of research [3, 4, 5, 11]. [sent-67, score-0.256]

34 To this end, we develop online variational inference for LDA, an approximate posterior inference algorithm that can analyze massive collections of documents. [sent-68, score-0.78]

35 We first review the traditional variational Bayes algorithm for LDA and its objective function, then present our online method, and show that it converges to a stationary point of the same objective function. [sent-69, score-0.621]

36 1 Batch variational Bayes for LDA In Variational Bayesian inference (VB) the true posterior is approximated by a simpler distribution q(z, θ, β), which is indexed by a set of free parameters [12, 13]. [sent-71, score-0.326]

37 This step will help us to derive an online inference algorithm. [sent-78, score-0.302]

38 This reveals that the variational objective relies only on ndw , the number of times word w appears in document d. [sent-80, score-0.489]

39 L can be optimized using coordinate ascent over the variational parameters φ, γ, λ [7]: φdwk ∝ exp{Eq [log θdk ] + Eq [log βkw ]}; γdk = α + w ndw φdwk ; λkw = η + d ndw φdwk . [sent-83, score-0.435]

40 2 Online variational inference for LDA Algorithm 1 has constant memory requirements and empirically converges faster than batch collapsed Gibbs sampling [3]. [sent-95, score-0.585]

41 We propose an online variational inference algorithm for fitting λ, the parameters to the variational posterior over the topic distributions β. [sent-98, score-1.029]

42 Our algorithm is nearly as simple as the batch VB algorithm, but converges much faster for large datasets. [sent-99, score-0.312]

43 A good setting of the topics λ is one for which the ELBO L is as high as possible after fitting the per-document variational parameters γ and φ with the E step defined in algorithm 1. [sent-100, score-0.302]

44 Our goal is to set λ to maximize L(n, λ) d (nd , γ(nd , λ), φ(nd , λ), λ), (7) where (nd , γd , φd , λ) is the dth document’s contribution to the variational bound in equation 4. [sent-102, score-0.228]

45 ˜ We then compute λ, the setting of λ that would be optimal (given φt ) if our entire corpus consisted of the single document nt repeated D times. [sent-106, score-0.292]

46 (In the true online case D → ∞, corresponding to empirical ˜ Bayes estimation of β. [sent-110, score-0.26]

47 3 that online LDA corresponds to a stochastic natural gradient algorithm on the variational objective L [15, 16]. [sent-117, score-0.598]

48 This algorithm closely resembles one proposed in [16] for online VB on models with hidden data— the most important difference is that we use an approximate E step to optimize γt and φt , since we cannot compute the conditional distribution p(zt , θt |β, nt , α) exactly. [sent-118, score-0.394]

49 In online LDA, this means computing λ using S > 1 observations: ˜ λkw = η + D S s ntsk φtskw , (8) where nts is the sth document in mini-batch t. [sent-121, score-0.343]

50 The variational parameters φts and γts for this document are fit with a normal E step. [sent-122, score-0.288]

51 Note that we recover batch VB when S = D and κ = 0. [sent-123, score-0.244]

52 In batch variational LDA, point estimates of the hyperparameters α and η can be fit given γ and λ using a linear-time Newton-Raphson method [7]. [sent-125, score-0.449]

53 end for incorporate updates for α and η into online LDA: α ← α − ρt α(γt ); ˜ η ← η − ρt η (λ), ˜ (9) where α(γt ) is the inverse of the Hessian times the gradient α (nt , γt , φt , λ), η (λ) is the inverse ˜ ˜ of the Hessian times the gradient η L, and ρt (τ0 + t)−κ as elsewhere. [sent-131, score-0.342]

54 Since variational inference replaces sampling with optimization, we can use results from stochastic optimization to analyze online LDA. [sent-134, score-0.605]

55 Thus, since t=0 ρt = ∞ and t=0 ρ2 < t ∞, the analysis in [19] shows both that λ converges and that the gradient λ d (nd , γd , φd , λ) converges to 0, and thus that λ converges to a stationary point. [sent-144, score-0.219]

56 In variational inference, an alternative is to use the Fisher information matrix of the variational distribution q (i. [sent-148, score-0.41]

57 , the Hessian of the log of the variational probability density function), which corresponds to using 1 Although we use a deterministic procedure to compute γ and φ given n and λ, this analysis can also be applied if γ and φ are optimized using a randomized algorithm. [sent-150, score-0.259]

58 The problem is addressed by online coordinate ascent algorithms such as those described in [20, 21, 16, 17, 10]. [sent-160, score-0.26]

59 Collapsed Gibbs Sampling (CGS) is a popular MCMC approach that samples from the posterior over topic assignments z by repeatedly sampling the topic assignment zdi conditioned on the data and all other topic assignments [22]. [sent-165, score-0.944]

60 One online MCMC approach adapts CGS by sampling topic assignments zdi based on the topic assignments and data for all previously analyzed words, instead of all other words in the corpus [23]. [sent-166, score-1.05]

61 Two alternative online MCMC approaches were considered in [24]. [sent-168, score-0.26]

62 The first, called incremental LDA, periodically resamples the topic assignments for previously analyzed words. [sent-169, score-0.284]

63 In a study in [24], none of these three online MCMC algorithms performed as well as batch CGS. [sent-171, score-0.504]

64 Instead of online methods, the authors of [4] used parallel computing to apply LDA to large corpora. [sent-172, score-0.282]

65 They developed two approximate parallel CGS schemes for LDA that gave similar predictive performance on held-out documents to batch CGS. [sent-173, score-0.431]

66 By contrast, batch VB can be implemented using constant memory, and parallelizes easily. [sent-176, score-0.244]

67 As we will show in the next section, its online counterpart is even faster. [sent-177, score-0.26]

68 4 Experiments We ran several experiments to evaluate online LDA’s efficiency and effectiveness. [sent-178, score-0.283]

69 The second set of experiments evaluates online VB in the setting where new documents are constantly being observed. [sent-180, score-0.454]

70 2 2 Open-source Python implementations of batch and online LDA can be found at http://www. [sent-183, score-0.504]

71 5 64 1030 1500 600 101 102 103 104 101 Time in seconds (log scale) 102 103 104 Time in seconds (log scale) Figure 2: Held-out perplexity obtained on the Nature (left) and Wikipedia (right) corpora as a function of CPU time. [sent-204, score-0.278]

72 For moderately large mini-batch sizes, online LDA finds solutions as good as those that the batch LDA finds, but with much less computation. [sent-205, score-0.504]

73 When fit to a 10,000-document subset of the training corpus batch LDA’s speed improves, but its performance suffers. [sent-206, score-0.343]

74 Perplexity is defined as the geometric mean of the inverse marginal probability of each word in the held-out set of documents: perplexity(ntest , λ, α) exp −( i log p(ntest |α, β))/( i i,w ntest ) iw (15) where ni test denotes the vector of word counts for the ith document. [sent-208, score-0.338]

75 Since we cannot directly compute log p(ntest |α, β), we use a lower bound on perplexity as a proxy: i perplexity(ntest , λ, α) ≤ exp −( ntest ) . [sent-209, score-0.368]

76 iw (16) The per-document parameters γi and φi for the variational distributions q(θi ) and q(zi ) are fit using the E step in algorithm 2. [sent-210, score-0.255]

77 The topics λ are fit to a training set of documents and then held fixed. [sent-211, score-0.238]

78 i Eq [log p(ntest , θi , zi |α, β)] − Eq [log q(θi , zi )])( i i,w There is some question as to the meaningfulness of perplexity as a metric for comparing different topic models [25]. [sent-214, score-0.427]

79 Although online LDA converges to a stationary point for any valid κ, τ0 , and S, the quality of this stationary point and the speed of convergence may depend on how the learning parameters are set. [sent-219, score-0.396]

80 We evaluated a range of settings of the learning parameters κ, τ0 , and S on two corpora: 352,549 documents from the journal Nature 3 and 100,000 documents downloaded from the English ver3 For the Nature articles, we removed all words not in a pruned vocabulary of 4,253 words. [sent-220, score-0.43]

81 We then ran online LDA for five hours on the remaining documents from each corpus for κ ∈ {0. [sent-223, score-0.568]

82 After five hours of CPU time, we computed perplexity on the test sets for the topics λ obtained at the end of each fit. [sent-230, score-0.307]

83 For mini-batch sizes from 256 to 16384 there is little difference in perplexity scores. [sent-236, score-0.236]

84 To compare batch LDA to online LDA, we evaluated held-out perplexity as a function of time on the Nature and Wikipedia corpora above. [sent-241, score-0.782]

85 We also evaluated batch LDA fit to a 10,000document subset of the training corpus. [sent-243, score-0.244]

86 On the larger Nature corpus, online LDA finds a solution as good as the batch algorithm’s with much less computation. [sent-247, score-0.504]

87 On the smaller Wikipedia corpus, the online algorithm finds a better solution than the batch algorithm does. [sent-248, score-0.552]

88 The batch algorithm converges quickly on the 10,000-document corpora, but makes less accurate predictions on held-out documents. [sent-249, score-0.312]

89 To demonstrate the ability of online VB to perform in a true online setting, we wrote a Python script to continually download and analyze mini-batches of articles chosen at random from a list of approximately 3. [sent-251, score-0.733]

90 The amount of time needed to download an article and convert it to a vector of word counts is comparable to the amount of time that the online LDA algorithm takes to analyze it. [sent-256, score-0.436]

91 Figure 1 shows the evolution of the perplexity obtained on the held-out validation set of 1,000 Wikipedia articles by the online algorithm as a function of number of articles seen. [sent-259, score-0.737]

92 Shown for comparison is the perplexity obtained by the online algorithm (with the same parameters) fit to only 98,000 Wikipedia articles, and that obtained by the batch algorithm fit to the same 98,000 articles. [sent-260, score-0.765]

93 The online algorithm outperforms the batch algorithm regardless of which training dataset is used, but it does best with access to a constant stream of novel documents. [sent-261, score-0.552]

94 The batch algorithm’s failure to outperform the online algorithm on limited data may be due to stochastic gradient’s robustness to local optima [19]. [sent-262, score-0.575]

95 The online algorithm converged after analyzing about half of the 3. [sent-263, score-0.284]

96 Even one iteration of the batch algorithm over that many articles would have taken days. [sent-265, score-0.388]

97 5 Discussion We have developed online variational Bayes (VB) for LDA. [sent-266, score-0.465]

98 This algorithm requires only a few more lines of code than the traditional batch VB of [7], and is handily applied to massive and streaming document collections. [sent-267, score-0.48]

99 The approach we used to derive an online version of batch VB for LDA is general (and simple) enough to apply to a wide variety of hierarchical Bayesian models. [sent-269, score-0.504]

100 Efficient methods for topic model inference on streaming document collections. [sent-344, score-0.362]


similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('lda', 0.41), ('vb', 0.309), ('kw', 0.264), ('online', 0.26), ('batch', 0.244), ('eq', 0.214), ('topic', 0.214), ('perplexity', 0.213), ('variational', 0.205), ('wikipedia', 0.186), ('documents', 0.165), ('dwk', 0.131), ('articles', 0.12), ('ndw', 0.115), ('nt', 0.11), ('dk', 0.106), ('ntest', 0.101), ('corpus', 0.099), ('zdi', 0.098), ('dirichlet', 0.094), ('mcmc', 0.083), ('document', 0.083), ('service', 0.083), ('elbo', 0.082), ('posterior', 0.079), ('companies', 0.074), ('topics', 0.073), ('industry', 0.066), ('twk', 0.066), ('word', 0.065), ('business', 0.065), ('corpora', 0.065), ('company', 0.06), ('kv', 0.059), ('massive', 0.057), ('bayes', 0.055), ('log', 0.054), ('billion', 0.053), ('latent', 0.05), ('assignments', 0.05), ('cgs', 0.049), ('handily', 0.049), ('wdi', 0.049), ('allocation', 0.048), ('stochastic', 0.047), ('vocabulary', 0.046), ('stationary', 0.046), ('collections', 0.045), ('converges', 0.044), ('blei', 0.043), ('inference', 0.042), ('gradient', 0.041), ('tk', 0.038), ('services', 0.037), ('zd', 0.037), ('em', 0.034), ('settings', 0.034), ('python', 0.034), ('download', 0.034), ('ntv', 0.033), ('ntw', 0.033), ('script', 0.033), ('tvk', 0.033), ('nds', 0.032), ('princeton', 0.031), ('health', 0.03), ('million', 0.029), ('constantly', 0.029), ('bayesian', 0.028), ('hessian', 0.028), ('int', 0.028), ('multiplying', 0.028), ('counts', 0.027), ('iw', 0.026), ('cpu', 0.026), ('analyze', 0.026), ('factorization', 0.025), ('nature', 0.025), ('mimno', 0.025), ('forgotten', 0.025), ('memory', 0.025), ('sampling', 0.025), ('algorithm', 0.024), ('streaming', 0.023), ('equation', 0.023), ('sizes', 0.023), ('ran', 0.023), ('management', 0.022), ('parallel', 0.022), ('tting', 0.022), ('objective', 0.021), ('multinomial', 0.021), ('road', 0.021), ('asuncion', 0.021), ('hours', 0.021), ('chain', 0.021), ('analyzed', 0.02), ('smyth', 0.02), ('market', 0.02), ('words', 0.02)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 1.0000005 194 nips-2010-Online Learning for Latent Dirichlet Allocation

Author: Matthew Hoffman, Francis R. Bach, David M. Blei

Abstract: We develop an online variational Bayes (VB) algorithm for Latent Dirichlet Allocation (LDA). Online LDA is based on online stochastic optimization with a natural gradient step, which we show converges to a local optimum of the VB objective function. It can handily analyze massive document collections, including those arriving in a stream. We study the performance of online LDA in several ways, including by fitting a 100-topic topic model to 3.3M articles from Wikipedia in a single pass. We demonstrate that online LDA finds topic models as good or better than those found with batch VB, and in a fraction of the time. 1

2 0.4072302 60 nips-2010-Deterministic Single-Pass Algorithm for LDA

Author: Issei Sato, Kenichi Kurihara, Hiroshi Nakagawa

Abstract: We develop a deterministic single-pass algorithm for latent Dirichlet allocation (LDA) in order to process received documents one at a time and then discard them in an excess text stream. Our algorithm does not need to store old statistics for all data. The proposed algorithm is much faster than a batch algorithm and is comparable to the batch algorithm in terms of perplexity in experiments.

3 0.28145298 286 nips-2010-Word Features for Latent Dirichlet Allocation

Author: James Petterson, Wray Buntine, Shravan M. Narayanamurthy, Tibério S. Caetano, Alex J. Smola

Abstract: We extend Latent Dirichlet Allocation (LDA) by explicitly allowing for the encoding of side information in the distribution over words. This results in a variety of new capabilities, such as improved estimates for infrequently occurring words, as well as the ability to leverage thesauri and dictionaries in order to boost topic cohesion within and across languages. We present experiments on multi-language topic synchronisation where dictionary information is used to bias corresponding words towards similar topics. Results indicate that our model substantially improves topic cohesion when compared to the standard LDA model. 1

4 0.22214349 276 nips-2010-Tree-Structured Stick Breaking for Hierarchical Data

Author: Zoubin Ghahramani, Michael I. Jordan, Ryan P. Adams

Abstract: Many data are naturally modeled by an unobserved hierarchical structure. In this paper we propose a flexible nonparametric prior over unknown data hierarchies. The approach uses nested stick-breaking processes to allow for trees of unbounded width and depth, where data can live at any node and are infinitely exchangeable. One can view our model as providing infinite mixtures where the components have a dependency structure corresponding to an evolutionary diffusion down a tree. By using a stick-breaking approach, we can apply Markov chain Monte Carlo methods based on slice sampling to perform Bayesian inference and simulate from the posterior distribution on trees. We apply our method to hierarchical clustering of images and topic modeling of text data. 1

5 0.19844104 150 nips-2010-Learning concept graphs from text with stick-breaking priors

Author: America Chambers, Padhraic Smyth, Mark Steyvers

Abstract: We present a generative probabilistic model for learning general graph structures, which we term concept graphs, from text. Concept graphs provide a visual summary of the thematic content of a collection of documents—a task that is difficult to accomplish using only keyword search. The proposed model can learn different types of concept graph structures and is capable of utilizing partial prior knowledge about graph structure as well as labeled documents. We describe a generative model that is based on a stick-breaking process for graphs, and a Markov Chain Monte Carlo inference procedure. Experiments on simulated data show that the model can recover known graph structure when learning in both unsupervised and semi-supervised modes. We also show that the proposed model is competitive in terms of empirical log likelihood with existing structure-based topic models (hPAM and hLDA) on real-world text data sets. Finally, we illustrate the application of the model to the problem of updating Wikipedia category graphs. 1

6 0.19785902 106 nips-2010-Global Analytic Solution for Variational Bayesian Matrix Factorization

7 0.18208697 124 nips-2010-Inductive Regularized Learning of Kernel Functions

8 0.16068709 38 nips-2010-Batch Bayesian Optimization via Simulation Matching

9 0.1603854 131 nips-2010-Joint Analysis of Time-Evolving Binary Matrices and Associated Documents

10 0.15415436 287 nips-2010-Worst-Case Linear Discriminant Analysis

11 0.10654242 268 nips-2010-The Neural Costs of Optimal Control

12 0.094358593 277 nips-2010-Two-Layer Generalization Analysis for Ranking Using Rademacher Average

13 0.09273874 193 nips-2010-Online Learning: Random Averages, Combinatorial Parameters, and Learnability

14 0.084301755 213 nips-2010-Predictive Subspace Learning for Multi-view Data: a Large Margin Approach

15 0.083774671 284 nips-2010-Variational bounds for mixed-data factor analysis

16 0.083245717 137 nips-2010-Large Margin Learning of Upstream Scene Understanding Models

17 0.082510673 283 nips-2010-Variational Inference over Combinatorial Spaces

18 0.079608671 51 nips-2010-Construction of Dependent Dirichlet Processes based on Poisson Processes

19 0.078904547 33 nips-2010-Approximate inference in continuous time Gaussian-Jump processes

20 0.073825419 264 nips-2010-Synergies in learning words and their referents


similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.182), (1, 0.028), (2, 0.035), (3, -0.003), (4, -0.395), (5, 0.214), (6, 0.305), (7, 0.023), (8, -0.115), (9, 0.052), (10, 0.228), (11, 0.074), (12, 0.026), (13, 0.106), (14, -0.092), (15, 0.069), (16, -0.046), (17, 0.037), (18, -0.053), (19, 0.031), (20, 0.126), (21, -0.126), (22, -0.084), (23, 0.048), (24, -0.073), (25, -0.078), (26, 0.023), (27, 0.002), (28, -0.045), (29, -0.062), (30, 0.022), (31, -0.137), (32, 0.045), (33, -0.049), (34, 0.015), (35, 0.076), (36, -0.132), (37, 0.053), (38, 0.089), (39, -0.018), (40, 0.018), (41, -0.084), (42, 0.09), (43, 0.049), (44, 0.025), (45, -0.089), (46, -0.029), (47, 0.014), (48, -0.058), (49, 0.013)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.96663642 194 nips-2010-Online Learning for Latent Dirichlet Allocation

Author: Matthew Hoffman, Francis R. Bach, David M. Blei

Abstract: We develop an online variational Bayes (VB) algorithm for Latent Dirichlet Allocation (LDA). Online LDA is based on online stochastic optimization with a natural gradient step, which we show converges to a local optimum of the VB objective function. It can handily analyze massive document collections, including those arriving in a stream. We study the performance of online LDA in several ways, including by fitting a 100-topic topic model to 3.3M articles from Wikipedia in a single pass. We demonstrate that online LDA finds topic models as good or better than those found with batch VB, and in a fraction of the time. 1

2 0.9447037 60 nips-2010-Deterministic Single-Pass Algorithm for LDA

Author: Issei Sato, Kenichi Kurihara, Hiroshi Nakagawa

Abstract: We develop a deterministic single-pass algorithm for latent Dirichlet allocation (LDA) in order to process received documents one at a time and then discard them in an excess text stream. Our algorithm does not need to store old statistics for all data. The proposed algorithm is much faster than a batch algorithm and is comparable to the batch algorithm in terms of perplexity in experiments.

3 0.74898344 286 nips-2010-Word Features for Latent Dirichlet Allocation

Author: James Petterson, Wray Buntine, Shravan M. Narayanamurthy, Tibério S. Caetano, Alex J. Smola

Abstract: We extend Latent Dirichlet Allocation (LDA) by explicitly allowing for the encoding of side information in the distribution over words. This results in a variety of new capabilities, such as improved estimates for infrequently occurring words, as well as the ability to leverage thesauri and dictionaries in order to boost topic cohesion within and across languages. We present experiments on multi-language topic synchronisation where dictionary information is used to bias corresponding words towards similar topics. Results indicate that our model substantially improves topic cohesion when compared to the standard LDA model. 1

4 0.64370549 106 nips-2010-Global Analytic Solution for Variational Bayesian Matrix Factorization

Author: Shinichi Nakajima, Masashi Sugiyama, Ryota Tomioka

Abstract: Bayesian methods of matrix factorization (MF) have been actively explored recently as promising alternatives to classical singular value decomposition. In this paper, we show that, despite the fact that the optimization problem is non-convex, the global optimal solution of variational Bayesian (VB) MF can be computed analytically by solving a quartic equation. This is highly advantageous over a popular VBMF algorithm based on iterated conditional modes since it can only find a local optimal solution after iterations. We further show that the global optimal solution of empirical VBMF (hyperparameters are also learned from data) can also be analytically computed. We illustrate the usefulness of our results through experiments.

5 0.57173282 131 nips-2010-Joint Analysis of Time-Evolving Binary Matrices and Associated Documents

Author: Eric Wang, Dehong Liu, Jorge Silva, Lawrence Carin, David B. Dunson

Abstract: We consider problems for which one has incomplete binary matrices that evolve with time (e.g., the votes of legislators on particular legislation, with each year characterized by a different such matrix). An objective of such analysis is to infer structure and inter-relationships underlying the matrices, here defined by latent features associated with each axis of the matrix. In addition, it is assumed that documents are available for the entities associated with at least one of the matrix axes. By jointly analyzing the matrices and documents, one may be used to inform the other within the analysis, and the model offers the opportunity to predict matrix values (e.g., votes) based only on an associated document (e.g., legislation). The research presented here merges two areas of machine-learning that have previously been investigated separately: incomplete-matrix analysis and topic modeling. The analysis is performed from a Bayesian perspective, with efficient inference constituted via Gibbs sampling. The framework is demonstrated by considering all voting data and available documents (legislation) during the 220-year lifetime of the United States Senate and House of Representatives. 1

6 0.48071083 150 nips-2010-Learning concept graphs from text with stick-breaking priors

7 0.47019455 276 nips-2010-Tree-Structured Stick Breaking for Hierarchical Data

8 0.45054236 287 nips-2010-Worst-Case Linear Discriminant Analysis

9 0.41012669 38 nips-2010-Batch Bayesian Optimization via Simulation Matching

10 0.35256037 237 nips-2010-Shadow Dirichlet for Restricted Probability Modeling

11 0.3512328 198 nips-2010-Optimal Web-Scale Tiering as a Flow Problem

12 0.34933004 213 nips-2010-Predictive Subspace Learning for Multi-view Data: a Large Margin Approach

13 0.34638497 283 nips-2010-Variational Inference over Combinatorial Spaces

14 0.33011958 90 nips-2010-Fast Large-scale Mixture Modeling with Component-specific Data Partitions

15 0.30138937 277 nips-2010-Two-Layer Generalization Analysis for Ranking Using Rademacher Average

16 0.2871249 202 nips-2010-Parallelized Stochastic Gradient Descent

17 0.2857534 32 nips-2010-Approximate Inference by Compilation to Arithmetic Circuits

18 0.28326398 49 nips-2010-Computing Marginal Distributions over Continuous Markov Networks for Statistical Relational Learning

19 0.28077546 215 nips-2010-Probabilistic Deterministic Infinite Automata

20 0.26821563 70 nips-2010-Efficient Optimization for Discriminative Latent Class Models


similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(13, 0.061), (17, 0.016), (27, 0.221), (30, 0.072), (35, 0.013), (45, 0.156), (50, 0.069), (52, 0.028), (60, 0.037), (73, 0.189), (77, 0.021), (78, 0.017), (90, 0.019)]

similar papers list:

simIndex simValue paperId paperTitle

1 0.86121213 123 nips-2010-Individualized ROI Optimization via Maximization of Group-wise Consistency of Structural and Functional Profiles

Author: Kaiming Li, Lei Guo, Carlos Faraco, Dajiang Zhu, Fan Deng, Tuo Zhang, Xi Jiang, Degang Zhang, Hanbo Chen, Xintao Hu, Steve Miller, Tianming Liu

Abstract: Functional segregation and integration are fundamental characteristics of the human brain. Studying the connectivity among segregated regions and the dynamics of integrated brain networks has drawn increasing interest. A very controversial, yet fundamental issue in these studies is how to determine the best functional brain regions or ROIs (regions of interests) for individuals. Essentially, the computed connectivity patterns and dynamics of brain networks are very sensitive to the locations, sizes, and shapes of the ROIs. This paper presents a novel methodology to optimize the locations of an individual's ROIs in the working memory system. Our strategy is to formulate the individual ROI optimization as a group variance minimization problem, in which group-wise functional and structural connectivity patterns, and anatomic profiles are defined as optimization constraints. The optimization problem is solved via the simulated annealing approach. Our experimental results show that the optimized ROIs have significantly improved consistency in structural and functional profiles across subjects, and have more reasonable localizations and more consistent morphological and anatomic profiles. 1 Int ro ducti o n The human brain’s function is segregated into distinct regions and integrated via axonal fibers [1]. Studying the connectivity among these regions and modeling their dynamics and interactions has drawn increasing interest and effort from the brain imaging and neuroscience communities [2-6]. For example, recently, the Human Connectome Project [7] and the 1000 Functional Connectomes Project [8] have embarked to elucidate large-scale connectivity patterns in the human brain. For traditional connectivity analysis, a variety of models including DCM (dynamics causal modeling), GCM (Granger causality modeling) and MVA (multivariate autoregressive modeling) are proposed [6, 9-10] to model the interactions of the ROIs. A fundamental issue in these studies is how to accurately identify the ROIs, which are the structural substrates for measuring connectivity. Currently, this is still an open, urgent, yet challenging problem in many brain imaging applications. From our perspective, the major challenges come from uncertainties in ROI boundary definition, the tremendous variability across individuals, and high nonlinearities within and around ROIs. Current approaches for identifying brain ROIs can be broadly classified into four categories. The first is manual labeling by experts using their domain knowledge. The second is a data-driven clustering of ROIs from the brain image itself. For instance, the ReHo (regional homogeneity) algorithm [11] has been used to identify regional homogeneous regions as ROIs. The third is to predefine ROIs in a template brain, and warp them back to the individual space using image registration [12]. Lastly, ROIs can be defined from the activated regions observed during a task-based fMRI paradigm. While fruitful results have been achieved using these approaches, there are various limitations. For instance, manual labeling is difficult to implement for large datasets and may be vulnerable to inter-subject and intra-subject variation; it is difficult to build correspondence across subjects using data-driven clustering methods; warping template ROIs back to individual space is subject to the accuracy of warping techniques and the anatomical variability across subjects. Even identifying ROIs using task-based fMRI paradigms, which is regarded as the standard approach for ROI identification, is still an open question. It was reported in [13] that many imaging-related variables including scanner vender, RF coil characteristics (phase array vs. volume coil), k-space acquisition trajectory, reconstruction algorithms, susceptibility -induced signal dropout, as well as field strength differences, contribute to variations in ROI identification. Other researchers reported that spatial smoothing, a common preprocessing technique in fMRI analysis to enhance SNR, may introduce artificial localization shift s (up to 12.1mm for Gaussian kernel volumetric smoothing) [14] or generate overly smoothed activation maps that may obscure important details [15]. For example, as shown in Fig.1a, the local maximum of the ROI was shifted by 4mm due to the spatial smoothing process. Additionally, its structural profile (Fig.1b) was significantly altered. Furthermore, group-based activation maps may show different patterns from an individual's activation map; Fig.1c depicts such differences. The top panel is the group activation map from a working memory study, while the bottom panel is the activation map of one subject in the study. As we can see from the highlighted boxes, the subject has less activated regions than the group analysis result. In conclusion, standard analysis of task-based fMRI paradigm data is inadequate to accurately localize ROIs for each individual. Fig.1. (a): Local activation map maxima (marked by the cross) shift of one ROI due to spatial volumetric smoothing. The top one was detected using unsmoothed data while the bottom one used smoothed data (FWHM: 6.875mm). (b): The corresponding fibers for the ROIs in (a). The ROIs are presented using a sphere (radius: 5mm). (c): Activation map differences between the group (top) and one subject (bottom). The highlighted boxes show two of the missing activated ROIs found from the group analysis. Without accurate and reliable individualized ROIs, the validity of brain connectivity analysis, and computational modeling of dynamics and interactions among brain networks , would be questionable. In response to this fundamental issue, this paper presents a novel computational methodology to optimize the locations of an individual's ROIs initialized from task-based fMRI. We use the ROIs identified in a block-based working memory paradigm as a test bed application to develop and evaluate our methodology. The optimization of ROI locations was formulated as an energy minimization problem, with the goal of jointly maximizing the group-wise consistency of functional and structural connectivity patterns and anatomic profiles. The optimization problem is solved via the well-established simulated annealing approach. Our experimental results show that the optimized ROIs achieved our optimization objectives and demonstrated promising results. 2 Mat eria l s a nd Metho ds 2.1 Data acquisition and preprocessing Twenty-five university students were recruited to participate in this study. Each participant performed an fMRI modified version of the OSPAN task (3 block types: OSPAN, Arithmetic, and Baseline) while fMRI data was acquired. DTI scans were also acquired for each participant. FMRI and DTI scans were acquired on a 3T GE Signa scanner. Acquisition parameters were as follows : fMRI: 64x64 matrix, 4mm slice thickness, 220mm FOV, 30 slices, TR=1.5s, TE=25ms, ASSET=2; DTI: 128x128 matrix, 2mm slice thickness, 256mm FOV, 60 slices, TR=15100ms, TE= variable, ASSET=2, 3 B0 images, 30 optimized gradient directions, b-value=1000). Each participant’s fMRI data was analyzed using FSL. Individual activation map Fig.2. working memory reflecting the OSPAN (OSPAN > Baseline) contrast was used. In ROIs mapped on a total, we identified the 16 highest activated ROIs, including left WM/GM surface and right insula, left and right medial frontal gyrus, left and right precentral gyrus, left and right paracingulate gyrus, left and right dorsolateral prefrontal cortex, left and right inferior parietal lobule, left occipital pole, right frontal pole, right lateral occipital gyrus, and left and right precuneus. Fig.2 shows the 16 ROIs mapped onto a WM(white matter)/GM(gray matter) cortical surface. For some individuals, there may be missing ROIs on their activation maps. Under such condition, we adapted the group activation map as a guide to find these ROIs using linear registration. DTI pre-processing consisted of skull removal, motion correction, and eddy current correction. After the pre-processing, fiber tracking was performed using MEDINRIA (FA threshold: 0.2; minimum fiber length: 20). Fibers were extended along their tangent directions to reach into the gray matter when necessary. Brain tissue segmentation was conducted on DTI data by the method in [16] and the cortical surface was reconstructed from the tissue maps using the marching cubes algorithm. The cortical surface was parcellated into anatomical regions using the HAMMER tool [17]. DTI space was used as the standard space from which to generate the GM (gray matter) segmentation and from which to report the ROI locations on the cortical surface. Since the fMRI and DTI sequences are both EPI (echo planar imaging) sequences, their distortions tend to be similar and the misalignment between DTI and fMRI images is much less than that between T1 and fMRI images [18]. Co-registration between DTI and fMRI data was performed using FSL FLIRT [12]. The activated ROIs and tracked fibers were then mapped onto the cortical surface for joint modeling. 2.2 Joint modeling of anatomical, structural and functional profiles Despite the high degree of variability across subjects, there are several aspects of regularity on which we base the proposed solution. Firstly, across subjects, the functional ROIs should have similar anatomical locations, e.g., similar locations in the atlas space. Secondly, these ROIs should have similar structural connectivity profiles across subjects. In other words, fibers penetrating the same functional ROIs should have at least similar target regions across subjects. Lastly, individual networks identified by task-based paradigms, like the working memory network we adapted as a test bed in this paper, should have similar functional connectivity pattern across subjects. The neuroscience bases of the above premises include: 1) structural and functional brain connectivity are closely related [19], and cortical gyrification and axongenesis processes are closely coupled [20]; Hence, it is reasonable to put these three types of information in a joint modeling framework. 2) Extensive studies have already demonstrated the existence of a common structural and functional architecture of the human brain [21, 22], and it makes sense to assume that the working memory network has similar structural and functional connectivity patterns across individuals. Based on these premises, we proposed to optimize the locations of individual functional ROIs by jointly modeling anatomic profiles, structural connectivity patterns, and functional connectivity patterns, as illustrated in Fig 3. The Fig.3. ROIs optimization scheme. goal was to minimize the group-wise variance (or maximize group-wise consistency) of these jointly modeled profiles. Mathematically, we modeled the group-wise variance as energy E as follows. A ROI from fMRI analysis was mapped onto the surface, and is represented by a center vertex and its neighborhood. Suppose đ?‘… đ?‘–đ?‘— is the ROI region j on the cortical surface of subject i identified in Section 2.1; we find a corresponding surface ROI region đ?‘† đ?‘–đ?‘— so that the energy E (contains energy from n subjects, each with m ROIs) is minimized: đ??¸ = đ??¸ đ?‘Ž (đ?œ† đ??¸ đ?‘? −đ?‘€ đ??¸ đ?‘? đ?œŽ đ??¸đ?‘? + (1 − đ?œ†) đ??¸ đ?‘“ −đ?‘€ đ??¸ đ?‘“ đ?œŽđ??¸đ?‘“ ) (1) where Ea is the anatomical constraint; Ec is the structural connectivity constraint, M Ec and ď ł E are the mean and standard deviation of Ec in the searching space; E f is the functional c connectivity constraint, M E f and ď ł E f are the mean and standard deviation of E f respectively; and ď Ź is a weighting parameter between 0 and 1. If not specified, and m is the number of ROIs in this paper. The details of these energy terms are provided in the following sections. 2.2.1 n is the number of subjects, Anatomical constraint energy Anatomical constraint energy Ea is defined to ensure that the optimized ROIs have similar anatomical locations in the atlas space (Fig.4 shows an example of ROIs of 15 randomly selected subjects in the atlas space). We model the locations for all ROIs in the atlas space using a Gaussian model (mean: đ?‘€ đ?‘‹ đ?‘— ,and standard deviation: ď ł X j for ROI j ). The model parameters were estimated using the initial locations obtained from Section 2.1. Let X ij be the center coordinate of region Sij in the atlas space, then Ea is expressed as đ??¸đ?‘Ž = { 1 đ?‘’ đ?‘‘đ?‘šđ?‘Žđ?‘Ľâˆ’1 Fig.4. ROI distributions in Atlas space. (đ?‘‘đ?‘šđ?‘Žđ?‘Ľâ‰¤1) (đ?‘‘đ?‘šđ?‘Žđ?‘Ľ>1) (2) ‖ , 1 ≤ đ?‘– ≤ đ?‘›; 1 ≤ đ?‘— ≤ đ?‘š. } (3) where đ?‘‘đ?‘šđ?‘Žđ?‘Ľ = đ?‘€đ?‘Žđ?‘Ľ { ‖ đ?‘‹ đ?‘–đ?‘— −đ?‘€ đ?‘‹ đ?‘— 3đ?œŽ đ?‘‹ đ?‘— Under the above definition, if any X ij is within the range of 3s X from the distribution model j center M X , the anatomical constraint energy will always be one; if not, there will be an j exponential increase of the energy which punishes the possible involvement of outliers. In other words, this energy factor will ensure the optimized ROIs will not significantly deviate away from the original ROIs. 2.2.2 Structural connectivity constraint energy Structural connectivity constraint energy Ec is defined to ensure the group has similar structural connectivity profiles for each functional ROI, since similar functional regions should have the similar structural connectivity patterns [19], n m Ec  ďƒĽďƒĽ (Cij  M C j )Covc 1 (Ci j  M C j )T (4) i 1 j 1 where Cij is the connectivity pattern vector for ROI j of subject i , M C j is the group mean 1 for ROI j , and Covc is the inverse of the covariance matrix. The connectivity pattern vector Cij is a fiber target region distribution histogram. To obtain this histogram, we first parcellate all the cortical surfaces into nine regions ( as shown in Fig.5a, four lobes for each hemisphere, and the subcortical region) using the HAMMER algorithm [17]. A finer parcellation is available but not used due to the relatively lower parcellation accuracy, which might render the histogram too sensitive to the parcellation result. Then, we extract fibers penetrating region Sij , and calculate the distribution of the fibers’ target cortical regions. Fig.5 illustrates the ideas. Fig.5. Structural connectivity pattern descriptor. (a): Cortical surface parcellation using HAMMER [17]; (b): Joint visualization of the cortical surface, two ROIs (blue and green spheres), and fibers penetrating the ROIs (in red and yellow, respectively); (c): Corresponding target region distribution histogram of ROIs in Fig.5b. There are nine bins corresponding to the nine cortical regions. Each bin contains the number of fibers that penetrate the ROI and are connected to the corresponding cortical region. Fiber numbers are normalized across subjects. 2.2.3 Functional connectivity constraint energy Functional connectivity constraint energy E f is defined to ensure each individual has similar functional connectivity patterns for the working memory system, assuming the human brain has similar functional architecture across individuals [21]. đ?‘› đ??¸ đ?‘“ = ∑ đ?‘–=1‖đ??šđ?‘– − đ?‘€ đ??š ‖ (5) Here, Fi is the functional connectivity matrix for subject i , and M F is the group mean of the dataset. The connectivity between each pair of ROIs is defined using the Pearson correlation. The matrix distance used here is the Frobenius norm. 2.3 Energy minimization solution The minimization of the energy defined in Section 2.2 is known as a combinatorial optimization problem. Traditional optimization methods may not fit this problem, since there are two noticeable characteristics in this application. First, we do not know how the energy changes with the varying locations of ROIs. Therefore, techniques like Newton’s method cannot be used. Second, the structure of search space is not smooth, which may lead to multiple local minima during optimization. To address this problem, we adopt the simulated annealing (SA) algorithm [23] for the energy minimization. The idea of the SA algorithm is based on random walk through the space for lower energies. In these random walks, the probability of taking a step is determined by the Boltzmann distribution, - (E - E )/ ( KT ) p = e i+ 1 i (6) if Ei 1  Ei , and p  1 when Ei 1 ď‚Ł Ei . Here, đ??¸ đ?‘– and đ??¸ đ?‘–+1 are the system energies at solution configuration đ?‘– and đ?‘– + 1 respectively; đ??ž is the Boltzmann constant; and đ?‘‡ is the system temperature. In other words, a step will be taken when a lower energy is found. A step will also be taken with probability p if a higher energy is found. This helps avoid the local minima in the search space. 3 R esult s Compared to structural and functional connectivity patterns, anatomical profiles are more easily affected by variability across individuals. Therefore, the anatomical constraint energy is designed to provide constraint only to ROIs that are obviously far away from reasonableness. The reasonable range was statistically modeled by the localizations of ROIs warped into the atlas space in Section 2.2.1. Our focus in this paper is the structural and functional profiles. 3.1 Optimization using anatomical and structural connectivity profile s In this section, we use only anatomical and structural connectivity profiles to optimize the locations of ROIs. The goal is to check whether the structural constraint energy Ec works as expected. Fig.6 shows the fibers penetrating the right precuneus for eight subjects before (top panel) and after optimization (bottom panel). The ROI is highlighted in a red sphere for each subject. As we can see from the figure (please refer to the highlighted yellow arrows), after optimization, the third and sixth subjects have significantly improved consistency with the rest of the group than before optimization, which proves the validity of the energy function Eq.(4). Fig.6. Comparison of structural profiles before and after optimization. Each column shows the corresponding before-optimization (top) and after-optimization (bottom) fibers of one subject. The ROI (right precuneus) is presented by the red sphere. 3.2 Optimization using anatomical and functional connectivity profiles In this section, we optimize the locations of ROIs using anatomical and functional profiles, aiming to validate the definition of functional connectivity constraint energy E f . If this energy constraint worked well, the functional connectivity variance of the working memory system across subjects would decrease. Fig.7 shows the comparison of the standard derivation for functional connectivity before (left) and after (right) optimization. As we can see, the variance is significantly reduced after optimization. This demonstrated the effectiveness of the defined functional connectivity constraint energy. Fig.7. Comparison of the standard derivation for functional connectivity before and after the optimization. Lower values mean more consistent connectivity pattern cross subjects. 3.3 Consistency between optimization of functional profiles and structural profiles Fig.8. Optimization consistency between functional and structural profiles. Top: Functional profile energy drop along with structural profile optimization; Bottom: Structural profile energy drop along with functional profile optimization. Each experiment was repeated 15 times with random initial ROI locations that met the anatomical constraint. The relationship between structure and function has been extensively studied [24], and it is widely believed that they are closely related. In this section, we study the relationship between functional profiles and structural profiles by looking at how the energy for one of them changes while the energy of the other decreases. The optimization processes in Section 3.1 and 3.2 were repeated 15 times respectively with random initial ROI locations that met the anatomical constraint. As shown in Fig.8, in general, the functional profile energies and structural profile energies are closely related in such a way that the functional profile energies tend to decrease along with the structural profile optimization process, while the structural profile energies also tend to decrease as the functional profile is optimized. This positively correlated decrease of functional profile energy and structural profile energy not only proves the close relationship between functional and structural profiles, but also demonstrates the consistency between functional and structural optimization, laying down the foundation of the joint optimiza tion, whose results are detailed in the following section. 3.4 Optimization connectivity profiles using anatomical, structural and functional In this section, we used all the constraints in Eq. (1) to optimize the individual locations of all ROIs in the working memory system. Ten runs of the optimization were performed using random initial ROI locations that met the anatomical constraint. Weighting parameter ď Ź equaled 0.5 for all these runs. Starting and ending temperatures for the simulated annealing algorithm are 8 and 0.05; Boltzmann constant K  1 . As we can see from Fig.9, most runs started to converge at step 24, and the convergence energy is quite close for all runs. This indicates that the simulated annealing algorithm provides a valid solution to our problem. By visual inspection, most of the ROIs move to more reasonable and consistent locations after the joint optimization. As an example, Fig.10 depicts the location movements of the ROI in Fig. 6 for eight subjects. As we can see, the ROIs for these subjects share a similar anatomical landmark, which appears to be the tip of the upper bank of the parieto-occipital sulcus. If the initial ROI was not at this landmark, it moved to the landmark after the optimization, which was the case for subjects 1, 4 and 7. The structural profiles of these ROIs are very similar to Fig.6. The results in Fig. 10 indicate the significant improvement of ROI locations achieved by the joint optimization procedure. Fig.9. Convergence performance of the simulated annealing . Each run has 28 temperature conditions. Fig.10. The movement of right precuneus before (in red sphere) and after (in green sphere) optimization for eight subjects. The

same-paper 2 0.82271558 194 nips-2010-Online Learning for Latent Dirichlet Allocation

Author: Matthew Hoffman, Francis R. Bach, David M. Blei

Abstract: We develop an online variational Bayes (VB) algorithm for Latent Dirichlet Allocation (LDA). Online LDA is based on online stochastic optimization with a natural gradient step, which we show converges to a local optimum of the VB objective function. It can handily analyze massive document collections, including those arriving in a stream. We study the performance of online LDA in several ways, including by fitting a 100-topic topic model to 3.3M articles from Wikipedia in a single pass. We demonstrate that online LDA finds topic models as good or better than those found with batch VB, and in a fraction of the time. 1

3 0.82012999 128 nips-2010-Infinite Relational Modeling of Functional Connectivity in Resting State fMRI

Author: Morten Mørup, Kristoffer Madsen, Anne-marie Dogonowski, Hartwig Siebner, Lars K. Hansen

Abstract: Functional magnetic resonance imaging (fMRI) can be applied to study the functional connectivity of the neural elements which form complex network at a whole brain level. Most analyses of functional resting state networks (RSN) have been based on the analysis of correlation between the temporal dynamics of various regions of the brain. While these models can identify coherently behaving groups in terms of correlation they give little insight into how these groups interact. In this paper we take a different view on the analysis of functional resting state networks. Starting from the definition of resting state as functional coherent groups we search for functional units of the brain that communicate with other parts of the brain in a coherent manner as measured by mutual information. We use the infinite relational model (IRM) to quantify functional coherent groups of resting state networks and demonstrate how the extracted component interactions can be used to discriminate between functional resting state activity in multiple sclerosis and normal subjects. 1

4 0.81847203 60 nips-2010-Deterministic Single-Pass Algorithm for LDA

Author: Issei Sato, Kenichi Kurihara, Hiroshi Nakagawa

Abstract: We develop a deterministic single-pass algorithm for latent Dirichlet allocation (LDA) in order to process received documents one at a time and then discard them in an excess text stream. Our algorithm does not need to store old statistics for all data. The proposed algorithm is much faster than a batch algorithm and is comparable to the batch algorithm in terms of perplexity in experiments.

5 0.81519705 39 nips-2010-Bayesian Action-Graph Games

Author: Albert X. Jiang, Kevin Leyton-brown

Abstract: Games of incomplete information, or Bayesian games, are an important gametheoretic model and have many applications in economics. We propose Bayesian action-graph games (BAGGs), a novel graphical representation for Bayesian games. BAGGs can represent arbitrary Bayesian games, and furthermore can compactly express Bayesian games exhibiting commonly encountered types of structure including symmetry, action- and type-specific utility independence, and probabilistic independence of type distributions. We provide an algorithm for computing expected utility in BAGGs, and discuss conditions under which the algorithm runs in polynomial time. Bayes-Nash equilibria of BAGGs can be computed by adapting existing algorithms for complete-information normal form games and leveraging our expected utility algorithm. We show both theoretically and empirically that our approaches improve significantly on the state of the art. 1

6 0.81458777 121 nips-2010-Improving Human Judgments by Decontaminating Sequential Dependencies

7 0.80440217 266 nips-2010-The Maximal Causes of Natural Scenes are Edge Filters

8 0.79921275 119 nips-2010-Implicit encoding of prior probabilities in optimal neural populations

9 0.79151809 161 nips-2010-Linear readout from a neural population with partial correlation data

10 0.79006553 81 nips-2010-Evaluating neuronal codes for inference using Fisher information

11 0.76419014 6 nips-2010-A Discriminative Latent Model of Image Region and Object Tag Correspondence

12 0.75971365 21 nips-2010-Accounting for network effects in neuronal responses using L1 regularized point process models

13 0.75104302 98 nips-2010-Functional form of motion priors in human motion perception

14 0.74511999 268 nips-2010-The Neural Costs of Optimal Control

15 0.73294371 44 nips-2010-Brain covariance selection: better individual functional connectivity models using population prior

16 0.72721928 97 nips-2010-Functional Geometry Alignment and Localization of Brain Areas

17 0.72706836 19 nips-2010-A rational decision making framework for inhibitory control

18 0.72679806 127 nips-2010-Inferring Stimulus Selectivity from the Spatial Structure of Neural Network Dynamics

19 0.72199827 17 nips-2010-A biologically plausible network for the computation of orientation dominance

20 0.72048432 56 nips-2010-Deciphering subsampled data: adaptive compressive sampling as a principle of brain communication