nips nips2010 nips2010-60 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Issei Sato, Kenichi Kurihara, Hiroshi Nakagawa
Abstract: We develop a deterministic single-pass algorithm for latent Dirichlet allocation (LDA) in order to process received documents one at a time and then discard them in an excess text stream. Our algorithm does not need to store old statistics for all data. The proposed algorithm is much faster than a batch algorithm and is comparable to the batch algorithm in terms of perplexity in experiments.
Reference: text
sentIndex sentText sentNum sentScore
1 jp Abstract We develop a deterministic single-pass algorithm for latent Dirichlet allocation (LDA) in order to process received documents one at a time and then discard them in an excess text stream. [sent-12, score-0.333]
2 Our algorithm does not need to store old statistics for all data. [sent-13, score-0.36]
3 The proposed algorithm is much faster than a batch algorithm and is comparable to the batch algorithm in terms of perplexity in experiments. [sent-14, score-0.539]
4 1 Introduction Huge quantities of text data such as news articles and blog posts arrives in a continuous stream. [sent-15, score-0.139]
5 Online learning has attracted a great deal of attention as a useful method for handling this growing quantity of streaming data because it processes data one at a time, whereas batch algorithms are not feasible in these settings because they need all the data at the same time. [sent-16, score-0.193]
6 This paper focus on online learning for Latent Dirichlet allocation (LDA) (Blei et al. [sent-17, score-0.109]
7 Existing studies were based on sampling methods such as the incremental Gibbs sampler and particle filter. [sent-23, score-0.271]
8 Moreover, sampling algorithms often need a resampling step in which a sampling method is applied to old data. [sent-26, score-0.265]
9 Storing old data or old samples adversely affects the good properties of online algorithms. [sent-27, score-0.519]
10 If the number of topics is T , the vocabulary size is V and m, so the required memory size is O(m ∗ T ∗ V ). [sent-31, score-0.152]
11 We propose two deterministic online algorithms; an incremental algorithms and a single-pass algorithm. [sent-32, score-0.303]
12 Our incremental algorithm is an incremental variant of the reverse EM (REM) algorithm (Minka, 2001). [sent-33, score-0.572]
13 The incremental algorithm updates parameters by replacing old sufficient statistics with new one for each datum. [sent-34, score-0.648]
14 Our single-pass algorithm is based on an incremental algorithm, but it does not need to store old statistics for all data. [sent-35, score-0.599]
15 In our single-pass algorithm, we propose a sequential update method for the Dirichlet parameters. [sent-36, score-0.115]
16 (2009) indicated the importance of estimating the parameters of the Dirichlet distribution, which is the distribution over the topic distributions of documents. [sent-39, score-0.132]
17 Moreover, we can deal with the growing vocabulary size. [sent-40, score-0.099]
18 VB-LDA is the variational inference for LDA, which is a batch inference; CVB-LDA is the collapsed variational inference for LDA (Teh et al. [sent-46, score-0.402]
19 , 2007); iREM-LDA is our incremental algorithm; and sREM-LDA is our single-pass algorithm for LDA. [sent-47, score-0.273]
20 2 Overview of Latent Dirichlet Allocation This section overviews LDA where documents are represented as random mixtures over latent topics and each topic is characterized by a distribution over words. [sent-52, score-0.337]
21 θ j denotes a T -dimensional probability vector that is the parameters of the multinomial distribution, and represents the topic distribution of document j. [sent-62, score-0.406]
22 β t is a multinomial parameter a V -dimensional probability where βt,v specifies the probability of generating word v given topic t. [sent-63, score-0.247]
23 For each of the T topics t, draw β t ∼ Dir(β|λ) ∝ ∏ ∏ λ−1 α βt,v . [sent-66, score-0.1]
24 For each of the M documents j, draw θ j ∼ Dir(θ|α) where Dir(θ|α) ∝ θt t −1 . [sent-67, score-0.111]
25 v t For each of the Nj words wj,i in document j, draw topic zj,i ∼ M ulti(z|θ j ) and draw word wj,i ∼ p(w|zj,i , β) where p(w = v|z = t, β) = βt,v . [sent-68, score-0.526]
26 That is to say, the complete-data likelihood of a document wj is given by p(wj , z j , θ j |α, β) = p(θ j |α) Nj ∏ p(wj,i |zj,i , β)p(z j |θ j ). [sent-69, score-0.272]
27 q(z) (6) The derivation of the update equation for q(z) is slightly complicated and involves approximations to compute intractable summations. [sent-98, score-0.115]
28 An update using only zero-order information is given by j ∑ ∑ λ + n−j,i t,wj,i −j,i ∝ ϕj,i,t , nt,v = ϕj,i,t I(wj,i = v), ∑ −j,i (αt + nj,t ), nj,t = V λ + v nt,v i=1 j,i N ϕj,i,t (7) where “-j,i” denotes subtracting ϕj,i,t . [sent-102, score-0.136]
29 3 Deterministic Online Algorithm for LDA The purpose of this study is to process text data such as news articles and blog posts arriving in a continuous stream by using LDA. [sent-104, score-0.185]
30 For these situations, we want to process text one at a time and then discard them. [sent-106, score-0.109]
31 We repeat iterations only for each word within a document. [sent-107, score-0.113]
32 That is, we update parameters from an arriving document and discard the document after doing l iterations. [sent-108, score-0.673]
33 First, we derived an incremental algorithm for LDA, and then we extended the incremental algorithm to a single-pass algorithm. [sent-110, score-0.546]
34 1 Incremental Learning (Neal and Hinton, 1998) provided a framework of incremental learning for the EM algorithm. [sent-112, score-0.239]
35 In general unsupervised-learning, we estimate sufficient statistics si for each data i, compute whole 3 ∑ sufficient statistics σ(= i si ) from all data, and update parameters by using σ. [sent-113, score-0.251]
36 In incremental learning, for each data i, we estimate si , compute σ (i) from si , and update parameters from σ (i) . [sent-114, score-0.402]
37 It is easy to extend an existing batch algorithm to the incremental learning if whole sufficient statistics or parameters updates are constructed by simply summarizing all data statistics. [sent-115, score-0.567]
38 The incremental algorithm processes data i by subtracting old sold and adding new snew , i. [sent-116, score-0.615]
39 i i i i The incremental algorithm needs to store old statistics {sold } for all data. [sent-119, score-0.599]
40 While batch algorithms i update parameters sweeping through all data, the incremental algorithm updates parameters for each data one at a time, which results in more parameter updates than batch algorithms. [sent-120, score-0.858]
41 Therefore, the incremental algorithm sometimes converge faster than batch algorithms. [sent-121, score-0.387]
42 2 Incremental Learning for LDA Our motivation for devising the incremental algorithm for LDA was to compare CVB-LDA and VB-LDA. [sent-123, score-0.273]
43 Statistics {nt,v } and {nj,t } are updated after each word is updated in CVB-LDA. [sent-124, score-0.165]
44 This update schedule is similar to that of the incremental algorithm. [sent-125, score-0.354]
45 This incremental property seems to be the reason CVB-LDA converges faster than VB-LDA. [sent-126, score-0.239]
46 Below, let us consider the incremental algorithm for LDA. [sent-128, score-0.273]
47 ϕj,i,t Γ(Nj + t αt ) t Γ(αt ) j,i,t j ˆ The maximum of F[q(z)] with respect to q(zj,i = t) = ϕj,i,t and β is given by ∑ ∑ ϕj,i,t ∝ βt,wj,i exp{Ψ(αt + ϕj,i,t )}, βtv ∝ λ + nj,t,v , i (11) (12) j The updates of α are the same as Eq. [sent-133, score-0.102]
48 Equation (12) incrementally updates the topic distribution of a document for each word as in CVB-LDA because we do not need γj,i in Eq. [sent-137, score-0.598]
49 That is, when we compare this algorithm with VB-LDA, it looks like a hybrid variant of a batch updates for α and β, and incremental updates for γ j , Here, we consider an incremental update for β to be analogous to CVBLDA, in which β is updated for each word. [sent-141, score-0.986]
50 Note that in the LDA setup, each independent identically distributed data point is a document not a word. [sent-142, score-0.242]
51 Therefore, we incrementally estimate β for each document by swapping ∑N statistics nj,t,v = i j ϕj,i,t I(wj,i = v) which is the number of word v generated from topic t in document j. [sent-143, score-0.771]
52 This algorithm incrementally optimizes the lower bound in Eq. [sent-145, score-0.096]
53 {λ + j Our single-pass algorithm sequentially sets a prior for each arrived document. [sent-173, score-0.126]
54 First, we update parameters from j-th arrived document given prior parameters {λt,v } for l iterations (j) (j) ϕj,i,t ∝βt,wj,i exp{Ψ(αt + ∑ (j) (j−1) ϕj,i,t )}, βt,v ∝ λt,v + i (0) Nj ∑ ϕj,i,t I(wj,i = v), (13) i (j) where λt,v = λ and αt is explained below. [sent-175, score-0.479]
55 Then, we set prior parameters by using statistics from the document for the next document as follows, and finally discard the document. [sent-176, score-0.599]
56 (14) i Since the updates are repeated within a document, we need to store statistics {ϕj,i,t } for each word in a document, but not for all words in all documents. [sent-178, score-0.294]
57 In the CVB and iREM algorithms, the Dirichlet parameter, α, uses batch updates, i. [sent-179, score-0.114]
58 , α is updated by using the entire document once in one iteration. [sent-181, score-0.283]
59 However, unlike parameter βt,v , the update of α in Eq. [sent-183, score-0.115]
60 Therefore, we derive a single-pass update for the Dirichlet parameter α using the following interpretation. [sent-185, score-0.115]
61 (5) to be the expectation of αt over posterior G(αt |˜t , ˜ given documents D and a b) at − 1 ˜ new prior G(αt |a0 , b0 ), i. [sent-187, score-0.142]
62 e, αt = E[αt ]G(α|˜t ,˜ = , where a b) ˜ b at =a0 + ˜ M ∑ j aj,t , ˜ = b0 + b M ∑ bj , (15) j old old old old old aj,t = {Ψ(αt + nj,t ) − Ψ(αt )}αt , bj = Ψ(Nj + α0 ) − Ψ(α0 ). [sent-188, score-1.364]
63 5 (16) We regard aj,t and bj as statistics for each document, which indicates that the parameters that we actually update are at and ˜ in Eq. [sent-189, score-0.255]
64 These updates are simple summarizations of aj,t and bj and ˜ b (j) prior parameters a0 and b0 . [sent-191, score-0.215]
65 b Analysis This section analyze the proposed updates for parameters α and β in the previous section. [sent-195, score-0.102]
66 We eventually update parameters α(j) and β (j) given document j as ∑j−1 a0 − 1 + d ad,t + aj,t bj (j) (j−1) α α aj,t α = αt (1 − ηj ) + ηj , ηj = αt = ∑j−1 ∑j . [sent-196, score-0.439]
67 bj b0 + d bd + bj b0 + d bd (19) ∑j−1 nd,t,v + nj,t,v (Vj − Vj−1 )λ + nj,t,· (j−1) β β nj,t,v d = βt,v (1 − ηj ) + ηj , ηβ = . [sent-197, score-0.214]
68 Our single-pass algorithm sequentially sets a prior for each arrived document, and so we can select a prior (a dimension of Dirichlet distribution) corresponding to observed vocabulary. [sent-199, score-0.157]
69 In fact, this property is useful for our problem because the vocabulary size is growing in the text stream. [sent-200, score-0.157]
70 These updates β α indicate that ηj and ηj interpolate the parameters estimated from old and new data. [sent-201, score-0.373]
71 Monro, 1951; Sato and Ishii, 2000), although a stepsize algorithm interpolates sufficient statistics whereas our updates interpolate parameters. [sent-204, score-0.241]
72 In our updates, how we set the stepsize for parameter updates is equivalent to how we set the hyperparameters for priors. [sent-205, score-0.143]
73 (j) βt,v = λ+ In our update of β, the appearance rate of word v in topic t in document j, nj,t,v /nj,t,· , is added (j−1) β to old parameter βt,v with weight ηj , which gradually decreases as the document is observed. [sent-207, score-1.054]
74 Therefore, the influence of new data decreases as the number of document observations increases as shown in Theorem 1. [sent-209, score-0.242]
75 Moreover, Theorem 1 is an important role in analyzing the convergence of parameter updates by using the super-martingale convergence theorem (Bertsekas and Tsitsiklis, 1996; Brochu et al. [sent-210, score-0.135]
76 6 4 Experiments We carried out experiments on document modeling in terms of perplexity. [sent-217, score-0.242]
77 The first was “Associated Press(AP)” where the number of documents was M = 10, 000 and the vocabulary size was V = 67, 291. [sent-219, score-0.163]
78 The comparison metric for document modeling was the “test set perplexity”. [sent-222, score-0.242]
79 We randomly split both data sets into a training set and a test set by assigninig 20% of the words in each document to the test set. [sent-223, score-0.265]
80 CVB0 and CVB are collapsed variational inference for LDA using zero-order and second-order information, respectively. [sent-232, score-0.156]
81 iREM represents the incremental reverse EM algorithm in Algorithm 3. [sent-233, score-0.299]
82 CVB0 and CVB estimates the Dirichlet parameter α over the topic distribution for all datasets, i. [sent-234, score-0.132]
83 L denotes the number of iterations for whole documents in Algorithms 1 and 2. [sent-238, score-0.14]
84 l denotes the number of iterations within a document in Algorithm 4. [sent-240, score-0.272]
85 Figure 2 demonstrates the results of experiments on the test set perplexity where lower values indicates better performance. [sent-242, score-0.234]
86 PF and sREM calculate the test set perplexity after sweeping through all traing set. [sent-244, score-0.247]
87 sREM does not outperform iREM in terms of perplexities, however, the performance of sREM is close to that of iREM As a results, we recommend sREM in a large number of documents or document streams. [sent-248, score-0.33]
88 sREM does not need to store old statistics for all documents unlike other algorithms. [sent-249, score-0.414]
89 Since we process each document individually, we can control the number of iterations corresponding to the length of each arrived document. [sent-251, score-0.333]
90 5 Conclusions We developed a deterministic online-learning algorithm for latent Dirichlet allocation (LDA). [sent-260, score-0.136]
91 The proposed algorithm can be applied to excess text data in a continuous stream because it processes received documents one at a time and then discard them. [sent-261, score-0.254]
92 The proposed algorithm was much faster than a batch algorithm and was comparable to the batch algorithm in terms of perplexity in experiments. [sent-262, score-0.539]
93 (a) and (b) compared test set perplexity with respect to the number of topics. [sent-295, score-0.209]
94 (c), (d), (e) and (f) compared test set perplexity with respect to the number of iterations in topic T = 100 and T = 300, respectively. [sent-296, score-0.371]
95 (g) and (h) show the relationships between test set perplexity and the number of iterations within a document, i. [sent-297, score-0.239]
96 On-line lda: Adaptive topic models for mining text streams with applications to topic detection and tracking. [sent-301, score-0.373]
97 Topic models over text streams: A study of batch and online unsupervised learning. [sent-313, score-0.211]
98 Owed to a martingale: A fast bayesian on-line em algorithm for multinomial models, 2004. [sent-332, score-0.113]
99 A collapsed variational Bayesian inference algorithm for latent Dirichlet allocation. [sent-390, score-0.23]
100 Efficient methods for topic model inference on streaming document collections. [sent-404, score-0.473]
wordName wordTfidf (topN-words)
[('srem', 0.42), ('cvb', 0.324), ('lda', 0.296), ('irem', 0.286), ('document', 0.242), ('old', 0.24), ('incremental', 0.239), ('perplexity', 0.209), ('vb', 0.173), ('nj', 0.132), ('topic', 0.132), ('pf', 0.13), ('dirichlet', 0.124), ('testset', 0.123), ('update', 0.115), ('batch', 0.114), ('updates', 0.102), ('wsj', 0.1), ('documents', 0.088), ('word', 0.083), ('bj', 0.082), ('ap', 0.078), ('topics', 0.077), ('ons', 0.076), ('vocabulary', 0.075), ('arrived', 0.061), ('itera', 0.061), ('text', 0.058), ('sato', 0.058), ('collapsed', 0.057), ('variational', 0.055), ('streaming', 0.055), ('store', 0.053), ('url', 0.052), ('discard', 0.051), ('asuncion', 0.048), ('dir', 0.048), ('em', 0.047), ('canini', 0.046), ('inference', 0.044), ('minka', 0.043), ('sold', 0.043), ('updated', 0.041), ('stepsize', 0.041), ('teh', 0.041), ('latent', 0.04), ('vj', 0.04), ('incrementally', 0.039), ('online', 0.039), ('alsumait', 0.038), ('snew', 0.038), ('sweeping', 0.038), ('allocation', 0.037), ('algorithm', 0.034), ('ulti', 0.033), ('end', 0.033), ('et', 0.033), ('statistics', 0.033), ('multinomial', 0.032), ('particle', 0.032), ('hours', 0.032), ('posts', 0.031), ('brochu', 0.031), ('interpolate', 0.031), ('rem', 0.031), ('prior', 0.031), ('wj', 0.03), ('blei', 0.03), ('iterations', 0.03), ('blog', 0.029), ('doi', 0.029), ('mimno', 0.029), ('wallach', 0.027), ('streams', 0.026), ('yao', 0.026), ('reverse', 0.026), ('mining', 0.025), ('resampling', 0.025), ('bd', 0.025), ('japan', 0.025), ('tokyo', 0.025), ('indicates', 0.025), ('deterministic', 0.025), ('si', 0.024), ('growing', 0.024), ('arriving', 0.023), ('banerjee', 0.023), ('bertsekas', 0.023), ('neal', 0.023), ('draw', 0.023), ('words', 0.023), ('posterior', 0.023), ('optimizes', 0.023), ('summarizing', 0.023), ('stream', 0.023), ('inferences', 0.023), ('whole', 0.022), ('thomas', 0.022), ('articles', 0.021), ('subtracting', 0.021)]
simIndex simValue paperId paperTitle
same-paper 1 0.99999964 60 nips-2010-Deterministic Single-Pass Algorithm for LDA
Author: Issei Sato, Kenichi Kurihara, Hiroshi Nakagawa
Abstract: We develop a deterministic single-pass algorithm for latent Dirichlet allocation (LDA) in order to process received documents one at a time and then discard them in an excess text stream. Our algorithm does not need to store old statistics for all data. The proposed algorithm is much faster than a batch algorithm and is comparable to the batch algorithm in terms of perplexity in experiments.
2 0.4072302 194 nips-2010-Online Learning for Latent Dirichlet Allocation
Author: Matthew Hoffman, Francis R. Bach, David M. Blei
Abstract: We develop an online variational Bayes (VB) algorithm for Latent Dirichlet Allocation (LDA). Online LDA is based on online stochastic optimization with a natural gradient step, which we show converges to a local optimum of the VB objective function. It can handily analyze massive document collections, including those arriving in a stream. We study the performance of online LDA in several ways, including by fitting a 100-topic topic model to 3.3M articles from Wikipedia in a single pass. We demonstrate that online LDA finds topic models as good or better than those found with batch VB, and in a fraction of the time. 1
3 0.20540059 286 nips-2010-Word Features for Latent Dirichlet Allocation
Author: James Petterson, Wray Buntine, Shravan M. Narayanamurthy, Tibério S. Caetano, Alex J. Smola
Abstract: We extend Latent Dirichlet Allocation (LDA) by explicitly allowing for the encoding of side information in the distribution over words. This results in a variety of new capabilities, such as improved estimates for infrequently occurring words, as well as the ability to leverage thesauri and dictionaries in order to boost topic cohesion within and across languages. We present experiments on multi-language topic synchronisation where dictionary information is used to bias corresponding words towards similar topics. Results indicate that our model substantially improves topic cohesion when compared to the standard LDA model. 1
4 0.18810776 276 nips-2010-Tree-Structured Stick Breaking for Hierarchical Data
Author: Zoubin Ghahramani, Michael I. Jordan, Ryan P. Adams
Abstract: Many data are naturally modeled by an unobserved hierarchical structure. In this paper we propose a flexible nonparametric prior over unknown data hierarchies. The approach uses nested stick-breaking processes to allow for trees of unbounded width and depth, where data can live at any node and are infinitely exchangeable. One can view our model as providing infinite mixtures where the components have a dependency structure corresponding to an evolutionary diffusion down a tree. By using a stick-breaking approach, we can apply Markov chain Monte Carlo methods based on slice sampling to perform Bayesian inference and simulate from the posterior distribution on trees. We apply our method to hierarchical clustering of images and topic modeling of text data. 1
5 0.14774552 131 nips-2010-Joint Analysis of Time-Evolving Binary Matrices and Associated Documents
Author: Eric Wang, Dehong Liu, Jorge Silva, Lawrence Carin, David B. Dunson
Abstract: We consider problems for which one has incomplete binary matrices that evolve with time (e.g., the votes of legislators on particular legislation, with each year characterized by a different such matrix). An objective of such analysis is to infer structure and inter-relationships underlying the matrices, here defined by latent features associated with each axis of the matrix. In addition, it is assumed that documents are available for the entities associated with at least one of the matrix axes. By jointly analyzing the matrices and documents, one may be used to inform the other within the analysis, and the model offers the opportunity to predict matrix values (e.g., votes) based only on an associated document (e.g., legislation). The research presented here merges two areas of machine-learning that have previously been investigated separately: incomplete-matrix analysis and topic modeling. The analysis is performed from a Bayesian perspective, with efficient inference constituted via Gibbs sampling. The framework is demonstrated by considering all voting data and available documents (legislation) during the 220-year lifetime of the United States Senate and House of Representatives. 1
6 0.13588555 150 nips-2010-Learning concept graphs from text with stick-breaking priors
7 0.11026749 287 nips-2010-Worst-Case Linear Discriminant Analysis
8 0.10994416 106 nips-2010-Global Analytic Solution for Variational Bayesian Matrix Factorization
9 0.096720472 51 nips-2010-Construction of Dependent Dirichlet Processes based on Poisson Processes
10 0.095816553 277 nips-2010-Two-Layer Generalization Analysis for Ranking Using Rademacher Average
11 0.084021039 38 nips-2010-Batch Bayesian Optimization via Simulation Matching
12 0.080884509 237 nips-2010-Shadow Dirichlet for Restricted Probability Modeling
13 0.064638264 264 nips-2010-Synergies in learning words and their referents
14 0.058983061 70 nips-2010-Efficient Optimization for Discriminative Latent Class Models
15 0.056104489 213 nips-2010-Predictive Subspace Learning for Multi-view Data: a Large Margin Approach
16 0.051813226 185 nips-2010-Nonparametric Density Estimation for Stochastic Optimization with an Observable State Variable
17 0.047932703 284 nips-2010-Variational bounds for mixed-data factor analysis
18 0.046781272 49 nips-2010-Computing Marginal Distributions over Continuous Markov Networks for Statistical Relational Learning
19 0.046770643 137 nips-2010-Large Margin Learning of Upstream Scene Understanding Models
20 0.045124773 285 nips-2010-Why are some word orders more common than others? A uniform information density account
topicId topicWeight
[(0, 0.127), (1, 0.027), (2, 0.018), (3, -0.008), (4, -0.368), (5, 0.165), (6, 0.228), (7, 0.044), (8, -0.094), (9, 0.024), (10, 0.199), (11, 0.074), (12, 0.009), (13, 0.104), (14, -0.083), (15, 0.039), (16, -0.02), (17, 0.028), (18, -0.054), (19, 0.008), (20, 0.096), (21, -0.104), (22, -0.06), (23, 0.072), (24, -0.058), (25, -0.046), (26, 0.029), (27, -0.002), (28, 0.01), (29, -0.039), (30, 0.003), (31, -0.06), (32, 0.028), (33, -0.016), (34, -0.009), (35, 0.074), (36, -0.047), (37, 0.046), (38, 0.081), (39, -0.047), (40, 0.043), (41, -0.108), (42, 0.025), (43, 0.043), (44, -0.021), (45, -0.077), (46, 0.03), (47, -0.014), (48, -0.08), (49, 0.063)]
simIndex simValue paperId paperTitle
same-paper 1 0.96427739 60 nips-2010-Deterministic Single-Pass Algorithm for LDA
Author: Issei Sato, Kenichi Kurihara, Hiroshi Nakagawa
Abstract: We develop a deterministic single-pass algorithm for latent Dirichlet allocation (LDA) in order to process received documents one at a time and then discard them in an excess text stream. Our algorithm does not need to store old statistics for all data. The proposed algorithm is much faster than a batch algorithm and is comparable to the batch algorithm in terms of perplexity in experiments.
2 0.91082132 194 nips-2010-Online Learning for Latent Dirichlet Allocation
Author: Matthew Hoffman, Francis R. Bach, David M. Blei
Abstract: We develop an online variational Bayes (VB) algorithm for Latent Dirichlet Allocation (LDA). Online LDA is based on online stochastic optimization with a natural gradient step, which we show converges to a local optimum of the VB objective function. It can handily analyze massive document collections, including those arriving in a stream. We study the performance of online LDA in several ways, including by fitting a 100-topic topic model to 3.3M articles from Wikipedia in a single pass. We demonstrate that online LDA finds topic models as good or better than those found with batch VB, and in a fraction of the time. 1
3 0.79358554 286 nips-2010-Word Features for Latent Dirichlet Allocation
Author: James Petterson, Wray Buntine, Shravan M. Narayanamurthy, Tibério S. Caetano, Alex J. Smola
Abstract: We extend Latent Dirichlet Allocation (LDA) by explicitly allowing for the encoding of side information in the distribution over words. This results in a variety of new capabilities, such as improved estimates for infrequently occurring words, as well as the ability to leverage thesauri and dictionaries in order to boost topic cohesion within and across languages. We present experiments on multi-language topic synchronisation where dictionary information is used to bias corresponding words towards similar topics. Results indicate that our model substantially improves topic cohesion when compared to the standard LDA model. 1
4 0.62453938 131 nips-2010-Joint Analysis of Time-Evolving Binary Matrices and Associated Documents
Author: Eric Wang, Dehong Liu, Jorge Silva, Lawrence Carin, David B. Dunson
Abstract: We consider problems for which one has incomplete binary matrices that evolve with time (e.g., the votes of legislators on particular legislation, with each year characterized by a different such matrix). An objective of such analysis is to infer structure and inter-relationships underlying the matrices, here defined by latent features associated with each axis of the matrix. In addition, it is assumed that documents are available for the entities associated with at least one of the matrix axes. By jointly analyzing the matrices and documents, one may be used to inform the other within the analysis, and the model offers the opportunity to predict matrix values (e.g., votes) based only on an associated document (e.g., legislation). The research presented here merges two areas of machine-learning that have previously been investigated separately: incomplete-matrix analysis and topic modeling. The analysis is performed from a Bayesian perspective, with efficient inference constituted via Gibbs sampling. The framework is demonstrated by considering all voting data and available documents (legislation) during the 220-year lifetime of the United States Senate and House of Representatives. 1
5 0.5533582 106 nips-2010-Global Analytic Solution for Variational Bayesian Matrix Factorization
Author: Shinichi Nakajima, Masashi Sugiyama, Ryota Tomioka
Abstract: Bayesian methods of matrix factorization (MF) have been actively explored recently as promising alternatives to classical singular value decomposition. In this paper, we show that, despite the fact that the optimization problem is non-convex, the global optimal solution of variational Bayesian (VB) MF can be computed analytically by solving a quartic equation. This is highly advantageous over a popular VBMF algorithm based on iterated conditional modes since it can only find a local optimal solution after iterations. We further show that the global optimal solution of empirical VBMF (hyperparameters are also learned from data) can also be analytically computed. We illustrate the usefulness of our results through experiments.
6 0.51010603 150 nips-2010-Learning concept graphs from text with stick-breaking priors
7 0.50821739 276 nips-2010-Tree-Structured Stick Breaking for Hierarchical Data
8 0.43458444 237 nips-2010-Shadow Dirichlet for Restricted Probability Modeling
9 0.41669428 287 nips-2010-Worst-Case Linear Discriminant Analysis
10 0.3365553 51 nips-2010-Construction of Dependent Dirichlet Processes based on Poisson Processes
11 0.33466801 198 nips-2010-Optimal Web-Scale Tiering as a Flow Problem
12 0.3312259 213 nips-2010-Predictive Subspace Learning for Multi-view Data: a Large Margin Approach
13 0.32674146 38 nips-2010-Batch Bayesian Optimization via Simulation Matching
14 0.29238793 277 nips-2010-Two-Layer Generalization Analysis for Ranking Using Rademacher Average
15 0.28166369 264 nips-2010-Synergies in learning words and their referents
16 0.28025591 215 nips-2010-Probabilistic Deterministic Infinite Automata
17 0.24946284 137 nips-2010-Large Margin Learning of Upstream Scene Understanding Models
18 0.24452642 90 nips-2010-Fast Large-scale Mixture Modeling with Component-specific Data Partitions
19 0.23921248 120 nips-2010-Improvements to the Sequence Memoizer
20 0.23661093 177 nips-2010-Multitask Learning without Label Correspondences
topicId topicWeight
[(13, 0.028), (27, 0.625), (30, 0.045), (35, 0.012), (45, 0.109), (50, 0.032), (60, 0.016), (77, 0.023), (90, 0.011)]
simIndex simValue paperId paperTitle
1 0.94992554 119 nips-2010-Implicit encoding of prior probabilities in optimal neural populations
Author: Deep Ganguli, Eero P. Simoncelli
Abstract: unkown-abstract
2 0.92392212 128 nips-2010-Infinite Relational Modeling of Functional Connectivity in Resting State fMRI
Author: Morten Mørup, Kristoffer Madsen, Anne-marie Dogonowski, Hartwig Siebner, Lars K. Hansen
Abstract: Functional magnetic resonance imaging (fMRI) can be applied to study the functional connectivity of the neural elements which form complex network at a whole brain level. Most analyses of functional resting state networks (RSN) have been based on the analysis of correlation between the temporal dynamics of various regions of the brain. While these models can identify coherently behaving groups in terms of correlation they give little insight into how these groups interact. In this paper we take a different view on the analysis of functional resting state networks. Starting from the definition of resting state as functional coherent groups we search for functional units of the brain that communicate with other parts of the brain in a coherent manner as measured by mutual information. We use the infinite relational model (IRM) to quantify functional coherent groups of resting state networks and demonstrate how the extracted component interactions can be used to discriminate between functional resting state activity in multiple sclerosis and normal subjects. 1
same-paper 3 0.90589088 60 nips-2010-Deterministic Single-Pass Algorithm for LDA
Author: Issei Sato, Kenichi Kurihara, Hiroshi Nakagawa
Abstract: We develop a deterministic single-pass algorithm for latent Dirichlet allocation (LDA) in order to process received documents one at a time and then discard them in an excess text stream. Our algorithm does not need to store old statistics for all data. The proposed algorithm is much faster than a batch algorithm and is comparable to the batch algorithm in terms of perplexity in experiments.
4 0.87078178 121 nips-2010-Improving Human Judgments by Decontaminating Sequential Dependencies
Author: Harold Pashler, Matthew Wilder, Robert Lindsey, Matt Jones, Michael C. Mozer, Michael P. Holmes
Abstract: For over half a century, psychologists have been struck by how poor people are at expressing their internal sensations, impressions, and evaluations via rating scales. When individuals make judgments, they are incapable of using an absolute rating scale, and instead rely on reference points from recent experience. This relativity of judgment limits the usefulness of responses provided by individuals to surveys, questionnaires, and evaluation forms. Fortunately, the cognitive processes that transform internal states to responses are not simply noisy, but rather are influenced by recent experience in a lawful manner. We explore techniques to remove sequential dependencies, and thereby decontaminate a series of ratings to obtain more meaningful human judgments. In our formulation, decontamination is fundamentally a problem of inferring latent states (internal sensations) which, because of the relativity of judgment, have temporal dependencies. We propose a decontamination solution using a conditional random field with constraints motivated by psychological theories of relative judgment. Our exploration of decontamination models is supported by two experiments we conducted to obtain ground-truth rating data on a simple length estimation task. Our decontamination techniques yield an over 20% reduction in the error of human judgments. 1
5 0.83707821 39 nips-2010-Bayesian Action-Graph Games
Author: Albert X. Jiang, Kevin Leyton-brown
Abstract: Games of incomplete information, or Bayesian games, are an important gametheoretic model and have many applications in economics. We propose Bayesian action-graph games (BAGGs), a novel graphical representation for Bayesian games. BAGGs can represent arbitrary Bayesian games, and furthermore can compactly express Bayesian games exhibiting commonly encountered types of structure including symmetry, action- and type-specific utility independence, and probabilistic independence of type distributions. We provide an algorithm for computing expected utility in BAGGs, and discuss conditions under which the algorithm runs in polynomial time. Bayes-Nash equilibria of BAGGs can be computed by adapting existing algorithms for complete-information normal form games and leveraging our expected utility algorithm. We show both theoretically and empirically that our approaches improve significantly on the state of the art. 1
6 0.81355339 266 nips-2010-The Maximal Causes of Natural Scenes are Edge Filters
7 0.73728734 81 nips-2010-Evaluating neuronal codes for inference using Fisher information
8 0.69904715 161 nips-2010-Linear readout from a neural population with partial correlation data
9 0.67978942 6 nips-2010-A Discriminative Latent Model of Image Region and Object Tag Correspondence
10 0.65993083 127 nips-2010-Inferring Stimulus Selectivity from the Spatial Structure of Neural Network Dynamics
11 0.6351527 97 nips-2010-Functional Geometry Alignment and Localization of Brain Areas
12 0.63514906 21 nips-2010-Accounting for network effects in neuronal responses using L1 regularized point process models
13 0.60915405 98 nips-2010-Functional form of motion priors in human motion perception
14 0.60077798 194 nips-2010-Online Learning for Latent Dirichlet Allocation
15 0.59172028 268 nips-2010-The Neural Costs of Optimal Control
16 0.58452696 19 nips-2010-A rational decision making framework for inhibitory control
17 0.58067632 44 nips-2010-Brain covariance selection: better individual functional connectivity models using population prior
18 0.57821327 244 nips-2010-Sodium entry efficiency during action potentials: A novel single-parameter family of Hodgkin-Huxley models
19 0.57637662 123 nips-2010-Individualized ROI Optimization via Maximization of Group-wise Consistency of Structural and Functional Profiles
20 0.56288803 17 nips-2010-A biologically plausible network for the computation of orientation dominance