nips nips2007 nips2007-189 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Jon D. Mcauliffe, David M. Blei
Abstract: We introduce supervised latent Dirichlet allocation (sLDA), a statistical model of labelled documents. The model accommodates a variety of response types. We derive a maximum-likelihood procedure for parameter estimation, which relies on variational approximations to handle intractable posterior expectations. Prediction problems motivate this research: we use the fitted model to predict response values for new documents. We test sLDA on two real-world problems: movie ratings predicted from reviews, and web page popularity predicted from text descriptions. We illustrate the benefits of sLDA versus modern regularized regression, as well as versus an unsupervised LDA analysis followed by a separate regression. 1
Reference: text
sentIndex sentText sentNum sentScore
1 edu Abstract We introduce supervised latent Dirichlet allocation (sLDA), a statistical model of labelled documents. [sent-7, score-0.26]
2 We derive a maximum-likelihood procedure for parameter estimation, which relies on variational approximations to handle intractable posterior expectations. [sent-9, score-0.206]
3 Prediction problems motivate this research: we use the fitted model to predict response values for new documents. [sent-10, score-0.29]
4 We test sLDA on two real-world problems: movie ratings predicted from reviews, and web page popularity predicted from text descriptions. [sent-11, score-0.33]
5 The complexity of document corpora has led to considerable interest in applying hierarchical statistical models based on what are called topics. [sent-14, score-0.222]
6 Formally, a topic is a probability distribution over terms in a vocabulary. [sent-15, score-0.233]
7 Informally, a topic represents an underlying semantic theme; a document consisting of a large number of words might be concisely modelled as deriving from a smaller number of topics. [sent-16, score-0.479]
8 Such topic models provide useful descriptive statistics for a collection, which facilitates tasks like browsing, searching, and assessing document similarity. [sent-17, score-0.432]
9 Most topic models, such as latent Dirichlet allocation (LDA) [4], are unsupervised: only the words in the documents are modelled. [sent-18, score-0.458]
10 The goal is to infer topics that maximize the likelihood (or the posterior probability) of the collection. [sent-19, score-0.297]
11 In this work, we develop supervised topic models, where each document is paired with a response. [sent-20, score-0.54]
12 The goal is to infer latent topics predictive of the response. [sent-21, score-0.349]
13 Given an unlabeled document, we infer its topic structure using a fitted model, then form its prediction. [sent-22, score-0.233]
14 Note that the response is not limited to text categories. [sent-23, score-0.241]
15 Other kinds of document-response corpora include essays with their grades, movie reviews with their numerical ratings, and web pages with counts of how many online community members liked them. [sent-24, score-0.319]
16 The hope was that LDA topics would turn out to be useful for categorization, since they act to reduce data dimension [4]. [sent-26, score-0.215]
17 However, when the goal is prediction, fitting unsupervised topics may not be a good choice. [sent-27, score-0.314]
18 Consider predicting a movie rating from the words in its review. [sent-28, score-0.256]
19 Intuitively, good predictive topics will differentiate words like “excellent”, “terrible”, and “average,” without regard to genre. [sent-29, score-0.316]
20 But topics estimated from an unsupervised model may correspond to genres, if that is the dominant structure in the corpus. [sent-30, score-0.337]
21 The distinction between unsupervised and supervised topic models is mirrored in existing dimension-reduction techniques. [sent-31, score-0.411]
22 For example, consider regression on unsupervised principal components versus partial least squares and projection pursuit [7], which both search for covariate linear combinations most predictive of a response variable. [sent-32, score-0.506]
23 developed a joint topic model for words and categories [8], and Blei and Jordan developed an LDA model to predict caption words from images [2]. [sent-35, score-0.399]
24 [5] proposed “labelled LDA,” which is also a joint topic model, but for genes and protein function categories. [sent-37, score-0.233]
25 We first develop the supervised latent Dirichlet allocation model (sLDA) for document-response pairs. [sent-40, score-0.23]
26 We derive parameter estimation and prediction algorithms for the real-valued response case. [sent-41, score-0.283]
27 Then we extend these techniques to handle diverse response types, using generalized linear models. [sent-42, score-0.315]
28 First, we use sLDA to predict movie ratings based on the text of the reviews. [sent-44, score-0.246]
29 The digg count prediction for a page is based on the page’s description in the forum. [sent-48, score-0.361]
30 2 Supervised latent Dirichlet allocation In topic models, we treat the words of a document as arising from a set of latent topics, that is, a set of unknown distributions over the vocabulary. [sent-51, score-0.737]
31 Documents in a corpus share the same set of K topics, but each document uses a mix of topics unique to itself. [sent-52, score-0.469]
32 Thus, topic models are a relaxation of classical document mixture models, which associate each document with a single unknown topic. [sent-53, score-0.631]
33 Here we build on latent Dirichlet allocation (LDA) [4], a topic model that serves as the basis for many others. [sent-54, score-0.384]
34 In LDA, we treat the topic proportions for a document as a draw from a Dirichlet distribution. [sent-55, score-0.557]
35 We obtain the words in the document by repeatedly choosing a topic assignment from those proportions, then drawing a word from the corresponding topic. [sent-56, score-0.553]
36 In supervised latent Dirichlet allocation (sLDA), we add to LDA a response variable associated with each document. [sent-57, score-0.47]
37 We jointly model the documents and the responses, in order to find latent topics that will best predict the response variables for future unlabeled documents. [sent-59, score-0.659]
38 3, we present the general version of sLDA, and explain how it handles diverse response types. [sent-68, score-0.272]
39 Fix for a moment the model parameters: the K topics β1:K (each βk a vector of term probabilities), the Dirichlet parameter α, and the response parameters η and σ 2 . [sent-70, score-0.479]
40 Under the sLDA model, each document and response arises from the following generative process: 1. [sent-71, score-0.44]
41 For each word (a) Draw topic assignment z n | θ ∼ Mult(θ ). [sent-74, score-0.307]
42 Draw response variable y | z 1:N , η, σ 2 ∼ N η z , σ 2 . [sent-77, score-0.263]
43 Notice the response comes from a normal linear model. [sent-80, score-0.317]
44 The covariates in this model are the (unobserved) empirical frequencies of the topics in the document. [sent-81, score-0.314]
45 (Bottom) The topics of a 10-topic sLDA model fit to the movie review data of Section 3. [sent-88, score-0.427]
46 10 however cinematography screenplay performances pictures effective picture 20 By regressing the response on the empirical topic frequencies, we treat the response as nonexchangeable with the words. [sent-99, score-0.765]
47 , words and their topic assignments) is generated first, under full word exchangeability; then, based on the document, the response variable is generated. [sent-102, score-0.617]
48 In contrast, one could formulate a model in which y is regressed on the topic proportions θ. [sent-103, score-0.3]
49 This treats the response and all the words as jointly exchangeable. [sent-104, score-0.312]
50 But as a practical matter, our chosen formulation seems more sensible: the response depends on the topic frequencies which actually occurred in the document, rather than on the mean of the distribution generating the topics. [sent-105, score-0.508]
51 Moreover, estimating a fully exchangeable model with enough topics allows some topics to be used entirely to explain the response variables, and others to be used to explain the word occurrences. [sent-106, score-0.768]
52 We carry out approximate maximum-likelihood estimation using a variational expectation-maximization (EM) procedure, which is the approach taken in unsupervised LDA as well [4]. [sent-109, score-0.279]
53 We maximize the evidence lower bound (ELBO) L(·), which for a single document has the form log p w1:N , y | α, β1:K , η, σ 2 ≥ L(γ , φ1:N ; α, β1:K , η, σ 2 ) = E[log p(θ | α)] + N N E[log p(Z n | θ)] + n=1 E[log p(wn | Z n , β1:K )] + E[log p(y | Z 1:N , η, σ 2 )] + H(q) . [sent-119, score-0.268]
54 (2) n=1 Here the expectation is taken with respect to a variational distribution q. [sent-120, score-0.236]
55 The first three terms and the entropy of the variational distribution are identical to the corresponding terms in the ELBO for unsupervised LDA [4]. [sent-123, score-0.279]
56 The fourth term is the expected log probability of the response variable given the latent topic assignments, 1 ¯ ¯ ¯ E[log p(y | Z 1:N , η, σ 2 )] = = − log 2π σ 2 − y 2 − 2yη E Z + η E Z Z η 2σ 2 . [sent-124, score-0.662]
57 We use block coordinate-ascent variational inference, maximizing with respect to each variational parameter vector in turn. [sent-129, score-0.384]
58 The terms that involve the variational Dirichlet γ are identical to those in unsupervised LDA, i. [sent-131, score-0.279]
59 As in LDA, the jth word’s variational distribution over topics depends on the word’s topic probabilities under the actual model (determined by β1:K ). [sent-146, score-0.651]
60 In the E-step, we estimate the approximate posterior distribution for each document-response pair using the variational inference algorithm described above. [sent-154, score-0.206]
61 In this section, we add ¯ ¯ document indexes to the previous section’s quantities, so y becomes yd and Z becomes Z d . [sent-157, score-0.342]
62 The M-step updates of the topics β1:K are the same as for unsupervised LDA, where the probability of a word under a topic is proportional to the expected number of times that it was assigned to that topic [4], D N ˆ new βk,w ∝ k 1(wd,n = w)φd,n . [sent-159, score-0.854]
63 Define y = y1:D as the vector of response values across documents. [sent-163, score-0.241]
64 (9) log(2π σ 2 ) − 2 2σ 2 Here the expectation is over the matrix A, using the variational distribution parameters chosen in the previous E-step. [sent-166, score-0.212]
65 We caution again: formulas in the previous section, such as (5), suppress the document indexes which appear here. [sent-170, score-0.221]
66 Specifically, we wish to compute the expected response value, given a new document w1:N and a fitted model {α, β1:K , η, σ 2 }: ¯ E[Y | w1:N , α, β1:K , η, σ 2 ] = η E[ Z | w1:N , α, β1:K ]. [sent-174, score-0.463]
67 We approximate the posterior mean of Z the variational inference procedure of the previous section. [sent-176, score-0.206]
68 Notice this is the same as variational inference for unsupervised LDA: since we averaged the response variable out of the right-hand side in (12), what remains is the standard unsupervised LDA model for Z 1:N and θ. [sent-178, score-0.664]
69 Thus, given a new document, we first compute Eq [Z 1:N ], the variational posterior distribution of the latent variables Z n . [sent-179, score-0.286]
70 Then, we estimate the response with ¯ ¯ E[Y | w1:N , α, β1:K , η, σ 2 ] ≈ η Eq [ Z ] = η φ. [sent-180, score-0.241]
71 3 (13) Diverse response types via generalized linear models Up to this point, we have confined our attention to an unconstrained real-valued response variable. [sent-182, score-0.551]
72 In many applications, however, we need to predict a categorical label, or a non-negative integral count, or a response with other kinds of constraints. [sent-183, score-0.291]
73 As we shall see, the result is a generic framework which can be specialized in a straightforward way to supervised topic models having a variety of response types. [sent-187, score-0.553]
74 ” For the random component, one takes the distribution of the response to be an exponential dispersion family with natural parameter ζ and dispersion parameter δ: p(y | ζ, δ) = h(y, δ) exp ζ y − A(ζ ) δ . [sent-189, score-0.401]
75 p(y | z 1:N , η, δ) = h(y, δ) exp We now have the flexibility to model any type of response variable whose distribution can be written in exponential dispersion form (14). [sent-199, score-0.364]
76 δ This changes the coordinate ascent step for each φ j , but the variational optimization is otherwise unaffected. [sent-207, score-0.22]
77 (18) Thus, the key to variational inference in sLDA is obtaining the gradient of the expected GLM lognormalizer. [sent-209, score-0.212]
78 First, we can replace −E[A(η Z )] with an adjustable lower bound whose gradient is known exactly; then we maximize over the original variational parameters plus the parameter controlling the bound. [sent-214, score-0.259]
79 (19) Here, VarGLM denotes the response variance under the GLM, given a specified value of the natural parameter—in all standard cases, this variance is a closed-form function of φ j . [sent-216, score-0.241]
80 The topic parameter estimates are given by (8), as before. [sent-224, score-0.233]
81 For the corpus-level ELBO, the gradient with respect to η becomes ∂ ∂η 1 δ D ¯ ¯ η φd yd − E A(η Z d ) d=1 = 1 δ D D ¯ φd yd − d=1 ¯ ¯ Eq µ(η Z d ) Z d . [sent-225, score-0.298]
82 This GLM ¯ ¯ ¯ mean response is a known function of η Z d in all standard cases. [sent-227, score-0.241]
83 0 sLDA LDA 5 10 15 20 25 30 35 40 45 50 ● ● ● ● ● ● ● 5 10 15 20 25 30 35 40 45 ● ● ● Number of topics ● 50 ● ● ● 2 4 10 Number of topics ● ● ● ● 20 −8. [sent-229, score-0.43]
84 5 Movie corpus ● 30 2 4 10 Number of topics 20 30 Number of topics Figure 2: Predictive R2 and per-word likelihood for the movie and Digg data (see Section 3). [sent-254, score-0.704]
85 The difference is that we now approximate the expected response value of a test document as ¯ E[Y | w1:N , α, β1:K , η, δ] ≈ Eq [µ(η Z )]. [sent-266, score-0.44]
86 (22) Again, this follows from iterated expectation plus the variational approximation. [sent-267, score-0.258]
87 When the variational expectation cannot be computed exactly, we apply the approximation methods we relied on for the GLM E-step and M-step. [sent-268, score-0.212]
88 We use the publicly available data introduced in [10], which contains movie reviews paired with the number of stars given. [sent-272, score-0.316]
89 For both sets of response variables, we transformed to approximate normality by taking logs. [sent-285, score-0.272]
90 In the E-step, we ran coordinate-ascent variational inference for each document until the relative change in the 7 per-document ELBO was less than 0. [sent-290, score-0.379]
91 For the movie review data set, we illustrate in Figure 1 a matching of the top words from each topic to the corresponding coefficient ηk . [sent-292, score-0.469]
92 ” In our 5-fold cross-validation (CV), we defined this quantity as the fraction of variability in the out-of-fold response values which is captured by the out-of-fold predictions: pR2 := 1 − ( (y − y )2 )/( (y − y )2 ). [sent-294, score-0.241]
93 This is the regression equivalent of using LDA topics as classification features [4]. [sent-296, score-0.256]
94 Moreover, this improvement does not come at the cost of document model quality. [sent-298, score-0.222]
95 The per-word hold-out likelihood comparison in Figure 2 (R) shows that sLDA fits the document data as well or better than LDA. [sent-299, score-0.229]
96 Note that Digg prediction is significantly harder than the movie review sentiment prediction, and that the homogeneity of Digg technology content leads the model to favor a small number of topics. [sent-300, score-0.306]
97 We used each document’s empirical distribution over words as its lasso covariates, setting the lasso complexity parameter with 5-fold CV. [sent-303, score-0.265]
98 The model accommodates the different types of response variable commonly encountered in practice. [sent-315, score-0.338]
99 We presented a variational procedure for approximate posterior inference, which we then incorporated in an EM algorithm for maximum-likelihood parameter estimation. [sent-316, score-0.206]
100 Moreover, the topic structure recovered by sLDA had higher hold-out likelihood than LDA on one problem, and equivalent hold-out likelihood on the other. [sent-319, score-0.293]
wordName wordTfidf (topN-words)
[('slda', 0.568), ('response', 0.241), ('digg', 0.237), ('topic', 0.233), ('lda', 0.231), ('topics', 0.215), ('document', 0.199), ('elbo', 0.198), ('movie', 0.189), ('glm', 0.189), ('variational', 0.18), ('yd', 0.121), ('lasso', 0.109), ('dirichlet', 0.103), ('unsupervised', 0.099), ('latent', 0.08), ('supervised', 0.079), ('word', 0.074), ('blei', 0.069), ('eq', 0.06), ('diggs', 0.059), ('normal', 0.055), ('corpus', 0.055), ('predictive', 0.054), ('dispersion', 0.053), ('accommodates', 0.052), ('sentiment', 0.052), ('documents', 0.05), ('treat', 0.05), ('allocation', 0.048), ('words', 0.047), ('proportions', 0.044), ('page', 0.044), ('web', 0.043), ('log', 0.043), ('prediction', 0.042), ('stars', 0.042), ('covariates', 0.042), ('links', 0.041), ('regression', 0.041), ('coordinate', 0.04), ('chemogenomic', 0.04), ('pang', 0.04), ('varglm', 0.04), ('wn', 0.039), ('count', 0.038), ('poisson', 0.037), ('homepage', 0.034), ('mult', 0.034), ('flaherty', 0.034), ('frequencies', 0.034), ('reviews', 0.033), ('expectation', 0.032), ('gradient', 0.032), ('proportionality', 0.031), ('normality', 0.031), ('ratings', 0.031), ('diverse', 0.031), ('draw', 0.031), ('community', 0.031), ('labelled', 0.03), ('likelihood', 0.03), ('family', 0.029), ('paired', 0.029), ('tted', 0.029), ('closed', 0.028), ('comprises', 0.028), ('cv', 0.026), ('users', 0.026), ('predict', 0.026), ('versus', 0.026), ('posterior', 0.026), ('maximize', 0.026), ('unconstrained', 0.026), ('exponential', 0.025), ('iterated', 0.025), ('systematic', 0.025), ('update', 0.025), ('categorical', 0.024), ('held', 0.024), ('covariate', 0.024), ('respect', 0.024), ('jointly', 0.024), ('notice', 0.024), ('binomial', 0.023), ('popularity', 0.023), ('regularized', 0.023), ('model', 0.023), ('corpora', 0.023), ('vocabulary', 0.023), ('publicly', 0.023), ('variable', 0.022), ('generalized', 0.022), ('princeton', 0.022), ('indexes', 0.022), ('diag', 0.021), ('plus', 0.021), ('mccallum', 0.021), ('linear', 0.021), ('rating', 0.02)]
simIndex simValue paperId paperTitle
same-paper 1 0.99999982 189 nips-2007-Supervised Topic Models
Author: Jon D. Mcauliffe, David M. Blei
Abstract: We introduce supervised latent Dirichlet allocation (sLDA), a statistical model of labelled documents. The model accommodates a variety of response types. We derive a maximum-likelihood procedure for parameter estimation, which relies on variational approximations to handle intractable posterior expectations. Prediction problems motivate this research: we use the fitted model to predict response values for new documents. We test sLDA on two real-world problems: movie ratings predicted from reviews, and web page popularity predicted from text descriptions. We illustrate the benefits of sLDA versus modern regularized regression, as well as versus an unsupervised LDA analysis followed by a separate regression. 1
2 0.54716897 183 nips-2007-Spatial Latent Dirichlet Allocation
Author: Xiaogang Wang, Eric Grimson
Abstract: In recent years, the language model Latent Dirichlet Allocation (LDA), which clusters co-occurring words into topics, has been widely applied in the computer vision field. However, many of these applications have difficulty with modeling the spatial and temporal structure among visual words, since LDA assumes that a document is a “bag-of-words”. It is also critical to properly design “words” and “documents” when using a language model to solve vision problems. In this paper, we propose a topic model Spatial Latent Dirichlet Allocation (SLDA), which better encodes spatial structures among visual words that are essential for solving many vision problems. The spatial information is not encoded in the values of visual words but in the design of documents. Instead of knowing the partition of words into documents a priori, the word-document assignment becomes a random hidden variable in SLDA. There is a generative procedure, where knowledge of spatial structure can be flexibly added as a prior, grouping visual words which are close in space into the same document. We use SLDA to discover objects from a collection of images, and show it achieves better performance than LDA. 1
3 0.24663165 73 nips-2007-Distributed Inference for Latent Dirichlet Allocation
Author: David Newman, Padhraic Smyth, Max Welling, Arthur U. Asuncion
Abstract: We investigate the problem of learning a widely-used latent-variable model – the Latent Dirichlet Allocation (LDA) or “topic” model – using distributed computation, where each of processors only sees of the total data set. We propose two distributed inference schemes that are motivated from different perspectives. The first scheme uses local Gibbs sampling on each processor with periodic updates—it is simple to implement and can be viewed as an approximation to a single processor implementation of Gibbs sampling. The second scheme relies on a hierarchical Bayesian extension of the standard LDA model to directly account for the fact that data are distributed across processors—it has a theoretical guarantee of convergence but is more complex to implement than the approximate method. Using five real-world text corpora we show that distributed learning works very well for LDA models, i.e., perplexity and precision-recall scores for distributed learning are indistinguishable from those obtained with single-processor learning. Our extensive experimental results include large-scale distributed computation on 1000 virtual processors; and speedup experiments of learning topics in a 100-million word corpus using 16 processors. ¢ ¤ ¦¥£ ¢ ¢
4 0.21511227 47 nips-2007-Collapsed Variational Inference for HDP
Author: Yee W. Teh, Kenichi Kurihara, Max Welling
Abstract: A wide variety of Dirichlet-multinomial ‘topic’ models have found interesting applications in recent years. While Gibbs sampling remains an important method of inference in such models, variational techniques have certain advantages such as easy assessment of convergence, easy optimization without the need to maintain detailed balance, a bound on the marginal likelihood, and side-stepping of issues with topic-identifiability. The most accurate variational technique thus far, namely collapsed variational latent Dirichlet allocation, did not deal with model selection nor did it include inference for hyperparameters. We address both issues by generalizing the technique, obtaining the first variational algorithm to deal with the hierarchical Dirichlet process and to deal with hyperparameters of Dirichlet variables. Experiments show a significant improvement in accuracy. 1
5 0.16954759 95 nips-2007-HM-BiTAM: Bilingual Topic Exploration, Word Alignment, and Translation
Author: Bing Zhao, Eric P. Xing
Abstract: We present a novel paradigm for statistical machine translation (SMT), based on a joint modeling of word alignment and the topical aspects underlying bilingual document-pairs, via a hidden Markov Bilingual Topic AdMixture (HM-BiTAM). In this paradigm, parallel sentence-pairs from a parallel document-pair are coupled via a certain semantic-flow, to ensure coherence of topical context in the alignment of mapping words between languages, likelihood-based training of topic-dependent translational lexicons, as well as in the inference of topic representations in each language. The learned HM-BiTAM can not only display topic patterns like methods such as LDA [1], but now for bilingual corpora; it also offers a principled way of inferring optimal translation using document context. Our method integrates the conventional model of HMM — a key component for most of the state-of-the-art SMT systems, with the recently proposed BiTAM model [10]; we report an extensive empirical analysis (in many ways complementary to the description-oriented [10]) of our method in three aspects: bilingual topic representation, word alignment, and translation.
6 0.15097485 105 nips-2007-Infinite State Bayes-Nets for Structured Domains
7 0.14972346 129 nips-2007-Mining Internet-Scale Software Repositories
8 0.12955183 33 nips-2007-Bayesian Inference for Spiking Neuron Models with a Sparsity Prior
9 0.12007435 2 nips-2007-A Bayesian LDA-based model for semi-supervised part-of-speech tagging
10 0.099084124 154 nips-2007-Predicting Brain States from fMRI Data: Incremental Functional Principal Component Regression
11 0.077558994 197 nips-2007-The Infinite Markov Model
12 0.075265765 140 nips-2007-Neural characterization in partially observed populations of spiking neurons
13 0.06960699 181 nips-2007-Sparse Overcomplete Latent Variable Decomposition of Counts Data
14 0.069125995 188 nips-2007-Subspace-Based Face Recognition in Analog VLSI
15 0.068980351 87 nips-2007-Fast Variational Inference for Large-scale Internet Diagnosis
16 0.066333428 84 nips-2007-Expectation Maximization and Posterior Constraints
17 0.062906526 213 nips-2007-Variational Inference for Diffusion Processes
18 0.058485456 43 nips-2007-Catching Change-points with Lasso
19 0.058447678 71 nips-2007-Discriminative Keyword Selection Using Support Vector Machines
20 0.056134593 53 nips-2007-Compressed Regression
topicId topicWeight
[(0, -0.208), (1, 0.142), (2, -0.05), (3, -0.447), (4, 0.147), (5, -0.061), (6, 0.085), (7, -0.211), (8, -0.057), (9, 0.156), (10, 0.024), (11, -0.039), (12, -0.048), (13, 0.224), (14, 0.093), (15, 0.131), (16, 0.245), (17, -0.096), (18, 0.052), (19, -0.003), (20, 0.011), (21, -0.044), (22, -0.068), (23, 0.016), (24, 0.018), (25, 0.067), (26, 0.008), (27, 0.007), (28, 0.017), (29, 0.017), (30, -0.095), (31, -0.02), (32, -0.043), (33, 0.073), (34, -0.022), (35, 0.03), (36, 0.02), (37, -0.025), (38, 0.021), (39, 0.005), (40, 0.038), (41, -0.001), (42, 0.047), (43, -0.059), (44, 0.025), (45, 0.031), (46, -0.018), (47, 0.034), (48, -0.04), (49, 0.058)]
simIndex simValue paperId paperTitle
same-paper 1 0.95443809 189 nips-2007-Supervised Topic Models
Author: Jon D. Mcauliffe, David M. Blei
Abstract: We introduce supervised latent Dirichlet allocation (sLDA), a statistical model of labelled documents. The model accommodates a variety of response types. We derive a maximum-likelihood procedure for parameter estimation, which relies on variational approximations to handle intractable posterior expectations. Prediction problems motivate this research: we use the fitted model to predict response values for new documents. We test sLDA on two real-world problems: movie ratings predicted from reviews, and web page popularity predicted from text descriptions. We illustrate the benefits of sLDA versus modern regularized regression, as well as versus an unsupervised LDA analysis followed by a separate regression. 1
2 0.88041109 73 nips-2007-Distributed Inference for Latent Dirichlet Allocation
Author: David Newman, Padhraic Smyth, Max Welling, Arthur U. Asuncion
Abstract: We investigate the problem of learning a widely-used latent-variable model – the Latent Dirichlet Allocation (LDA) or “topic” model – using distributed computation, where each of processors only sees of the total data set. We propose two distributed inference schemes that are motivated from different perspectives. The first scheme uses local Gibbs sampling on each processor with periodic updates—it is simple to implement and can be viewed as an approximation to a single processor implementation of Gibbs sampling. The second scheme relies on a hierarchical Bayesian extension of the standard LDA model to directly account for the fact that data are distributed across processors—it has a theoretical guarantee of convergence but is more complex to implement than the approximate method. Using five real-world text corpora we show that distributed learning works very well for LDA models, i.e., perplexity and precision-recall scores for distributed learning are indistinguishable from those obtained with single-processor learning. Our extensive experimental results include large-scale distributed computation on 1000 virtual processors; and speedup experiments of learning topics in a 100-million word corpus using 16 processors. ¢ ¤ ¦¥£ ¢ ¢
3 0.81303287 183 nips-2007-Spatial Latent Dirichlet Allocation
Author: Xiaogang Wang, Eric Grimson
Abstract: In recent years, the language model Latent Dirichlet Allocation (LDA), which clusters co-occurring words into topics, has been widely applied in the computer vision field. However, many of these applications have difficulty with modeling the spatial and temporal structure among visual words, since LDA assumes that a document is a “bag-of-words”. It is also critical to properly design “words” and “documents” when using a language model to solve vision problems. In this paper, we propose a topic model Spatial Latent Dirichlet Allocation (SLDA), which better encodes spatial structures among visual words that are essential for solving many vision problems. The spatial information is not encoded in the values of visual words but in the design of documents. Instead of knowing the partition of words into documents a priori, the word-document assignment becomes a random hidden variable in SLDA. There is a generative procedure, where knowledge of spatial structure can be flexibly added as a prior, grouping visual words which are close in space into the same document. We use SLDA to discover objects from a collection of images, and show it achieves better performance than LDA. 1
4 0.71543348 47 nips-2007-Collapsed Variational Inference for HDP
Author: Yee W. Teh, Kenichi Kurihara, Max Welling
Abstract: A wide variety of Dirichlet-multinomial ‘topic’ models have found interesting applications in recent years. While Gibbs sampling remains an important method of inference in such models, variational techniques have certain advantages such as easy assessment of convergence, easy optimization without the need to maintain detailed balance, a bound on the marginal likelihood, and side-stepping of issues with topic-identifiability. The most accurate variational technique thus far, namely collapsed variational latent Dirichlet allocation, did not deal with model selection nor did it include inference for hyperparameters. We address both issues by generalizing the technique, obtaining the first variational algorithm to deal with the hierarchical Dirichlet process and to deal with hyperparameters of Dirichlet variables. Experiments show a significant improvement in accuracy. 1
5 0.57356244 105 nips-2007-Infinite State Bayes-Nets for Structured Domains
Author: Max Welling, Ian Porteous, Evgeniy Bart
Abstract: A general modeling framework is proposed that unifies nonparametric-Bayesian models, topic-models and Bayesian networks. This class of infinite state Bayes nets (ISBN) can be viewed as directed networks of ‘hierarchical Dirichlet processes’ (HDPs) where the domain of the variables can be structured (e.g. words in documents or features in images). We show that collapsed Gibbs sampling can be done efficiently in these models by leveraging the structure of the Bayes net and using the forward-filtering-backward-sampling algorithm for junction trees. Existing models, such as nested-DP, Pachinko allocation, mixed membership stochastic block models as well as a number of new models are described as ISBNs. Two experiments have been performed to illustrate these ideas. 1
6 0.53061962 129 nips-2007-Mining Internet-Scale Software Repositories
7 0.514907 95 nips-2007-HM-BiTAM: Bilingual Topic Exploration, Word Alignment, and Translation
8 0.322153 2 nips-2007-A Bayesian LDA-based model for semi-supervised part-of-speech tagging
9 0.30399904 188 nips-2007-Subspace-Based Face Recognition in Analog VLSI
10 0.25719213 154 nips-2007-Predicting Brain States from fMRI Data: Incremental Functional Principal Component Regression
11 0.2533004 196 nips-2007-The Infinite Gamma-Poisson Feature Model
12 0.23853108 33 nips-2007-Bayesian Inference for Spiking Neuron Models with a Sparsity Prior
13 0.23342162 87 nips-2007-Fast Variational Inference for Large-scale Internet Diagnosis
14 0.23172529 197 nips-2007-The Infinite Markov Model
15 0.22745994 172 nips-2007-Scene Segmentation with CRFs Learned from Partially Labeled Images
16 0.22683983 181 nips-2007-Sparse Overcomplete Latent Variable Decomposition of Counts Data
17 0.22668286 71 nips-2007-Discriminative Keyword Selection Using Support Vector Machines
18 0.21209767 1 nips-2007-A Bayesian Framework for Cross-Situational Word-Learning
19 0.21040586 99 nips-2007-Hierarchical Penalization
20 0.21025749 70 nips-2007-Discriminative K-means for Clustering
topicId topicWeight
[(5, 0.044), (13, 0.025), (16, 0.048), (21, 0.065), (31, 0.017), (34, 0.017), (35, 0.034), (46, 0.173), (47, 0.103), (49, 0.026), (83, 0.119), (85, 0.02), (87, 0.178), (90, 0.048)]
simIndex simValue paperId paperTitle
1 0.84151906 81 nips-2007-Estimating disparity with confidence from energy neurons
Author: Eric K. Tsang, Bertram E. Shi
Abstract: The peak location in a population of phase-tuned neurons has been shown to be a more reliable estimator for disparity than the peak location in a population of position-tuned neurons. Unfortunately, the disparity range covered by a phasetuned population is limited by phase wraparound. Thus, a single population cannot cover the large range of disparities encountered in natural scenes unless the scale of the receptive fields is chosen to be very large, which results in very low resolution depth estimates. Here we describe a biologically plausible measure of the confidence that the stimulus disparity is inside the range covered by a population of phase-tuned neurons. Based upon this confidence measure, we propose an algorithm for disparity estimation that uses many populations of high-resolution phase-tuned neurons that are biased to different disparity ranges via position shifts between the left and right eye receptive fields. The population with the highest confidence is used to estimate the stimulus disparity. We show that this algorithm outperforms a previously proposed coarse-to-fine algorithm for disparity estimation, which uses disparity estimates from coarse scales to select the populations used at finer scales and can effectively detect occlusions.
same-paper 2 0.83078182 189 nips-2007-Supervised Topic Models
Author: Jon D. Mcauliffe, David M. Blei
Abstract: We introduce supervised latent Dirichlet allocation (sLDA), a statistical model of labelled documents. The model accommodates a variety of response types. We derive a maximum-likelihood procedure for parameter estimation, which relies on variational approximations to handle intractable posterior expectations. Prediction problems motivate this research: we use the fitted model to predict response values for new documents. We test sLDA on two real-world problems: movie ratings predicted from reviews, and web page popularity predicted from text descriptions. We illustrate the benefits of sLDA versus modern regularized regression, as well as versus an unsupervised LDA analysis followed by a separate regression. 1
3 0.81902468 50 nips-2007-Combined discriminative and generative articulated pose and non-rigid shape estimation
Author: Leonid Sigal, Alexandru Balan, Michael J. Black
Abstract: Estimation of three-dimensional articulated human pose and motion from images is a central problem in computer vision. Much of the previous work has been limited by the use of crude generative models of humans represented as articulated collections of simple parts such as cylinders. Automatic initialization of such models has proved difficult and most approaches assume that the size and shape of the body parts are known a priori. In this paper we propose a method for automatically recovering a detailed parametric model of non-rigid body shape and pose from monocular imagery. Specifically, we represent the body using a parameterized triangulated mesh model that is learned from a database of human range scans. We demonstrate a discriminative method to directly recover the model parameters from monocular images using a conditional mixture of kernel regressors. This predicted pose and shape are used to initialize a generative model for more detailed pose and shape estimation. The resulting approach allows fully automatic pose and shape recovery from monocular and multi-camera imagery. Experimental results show that our method is capable of robustly recovering articulated pose, shape and biometric measurements (e.g. height, weight, etc.) in both calibrated and uncalibrated camera environments. 1
4 0.81722778 59 nips-2007-Continuous Time Particle Filtering for fMRI
Author: Lawrence Murray, Amos J. Storkey
Abstract: We construct a biologically motivated stochastic differential model of the neural and hemodynamic activity underlying the observed Blood Oxygen Level Dependent (BOLD) signal in Functional Magnetic Resonance Imaging (fMRI). The model poses a difficult parameter estimation problem, both theoretically due to the nonlinearity and divergence of the differential system, and computationally due to its time and space complexity. We adapt a particle filter and smoother to the task, and discuss some of the practical approaches used to tackle the difficulties, including use of sparse matrices and parallelisation. Results demonstrate the tractability of the approach in its application to an effective connectivity study. 1
5 0.80849236 129 nips-2007-Mining Internet-Scale Software Repositories
Author: Erik Linstead, Paul Rigor, Sushil Bajracharya, Cristina Lopes, Pierre F. Baldi
Abstract: Large repositories of source code create new challenges and opportunities for statistical machine learning. Here we first develop Sourcerer, an infrastructure for the automated crawling, parsing, and database storage of open source software. Sourcerer allows us to gather Internet-scale source code. For instance, in one experiment, we gather 4,632 java projects from SourceForge and Apache totaling over 38 million lines of code from 9,250 developers. Simple statistical analyses of the data first reveal robust power-law behavior for package, SLOC, and lexical containment distributions. We then develop and apply unsupervised author-topic, probabilistic models to automatically discover the topics embedded in the code and extract topic-word and author-topic distributions. In addition to serving as a convenient summary for program function and developer activities, these and other related distributions provide a statistical and information-theoretic basis for quantifying and analyzing developer similarity and competence, topic scattering, and document tangling, with direct applications to software engineering. Finally, by combining software textual content with structural information captured by our CodeRank approach, we are able to significantly improve software retrieval performance, increasing the AUC metric to 0.84– roughly 10-30% better than previous approaches based on text alone. Supplementary material may be found at: http://sourcerer.ics.uci.edu/nips2007/nips07.html. 1
6 0.79584318 183 nips-2007-Spatial Latent Dirichlet Allocation
7 0.7550658 199 nips-2007-The Price of Bandit Information for Online Optimization
8 0.75476402 108 nips-2007-Kernel Measures of Conditional Dependence
9 0.71769118 73 nips-2007-Distributed Inference for Latent Dirichlet Allocation
10 0.70535034 140 nips-2007-Neural characterization in partially observed populations of spiking neurons
11 0.68281233 2 nips-2007-A Bayesian LDA-based model for semi-supervised part-of-speech tagging
12 0.6825999 105 nips-2007-Infinite State Bayes-Nets for Structured Domains
13 0.66706288 47 nips-2007-Collapsed Variational Inference for HDP
14 0.66539013 153 nips-2007-People Tracking with the Laplacian Eigenmaps Latent Variable Model
15 0.66155344 95 nips-2007-HM-BiTAM: Bilingual Topic Exploration, Word Alignment, and Translation
16 0.66020697 172 nips-2007-Scene Segmentation with CRFs Learned from Partially Labeled Images
17 0.65885836 180 nips-2007-Sparse Feature Learning for Deep Belief Networks
18 0.65124607 56 nips-2007-Configuration Estimates Improve Pedestrian Finding
19 0.64936942 113 nips-2007-Learning Visual Attributes
20 0.64914781 211 nips-2007-Unsupervised Feature Selection for Accurate Recommendation of High-Dimensional Image Data