nips nips2011 nips2011-281 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Dae I. Kim, Erik B. Sudderth
Abstract: Topic models are learned via a statistical model of variation within document collections, but designed to extract meaningful semantic structure. Desirable traits include the ability to incorporate annotations or metadata associated with documents; the discovery of correlated patterns of topic usage; and the avoidance of parametric assumptions, such as manual specification of the number of topics. We propose a doubly correlated nonparametric topic (DCNT) model, the first model to simultaneously capture all three of these properties. The DCNT models metadata via a flexible, Gaussian regression on arbitrary input features; correlations via a scalable square-root covariance representation; and nonparametric selection from an unbounded series of potential topics via a stick-breaking construction. We validate the semantic structure and predictive performance of the DCNT using a corpus of NIPS documents annotated by various metadata. 1
Reference: text
sentIndex sentText sentNum sentScore
1 edu Abstract Topic models are learned via a statistical model of variation within document collections, but designed to extract meaningful semantic structure. [sent-6, score-0.12]
2 Desirable traits include the ability to incorporate annotations or metadata associated with documents; the discovery of correlated patterns of topic usage; and the avoidance of parametric assumptions, such as manual specification of the number of topics. [sent-7, score-0.74]
3 We propose a doubly correlated nonparametric topic (DCNT) model, the first model to simultaneously capture all three of these properties. [sent-8, score-0.595]
4 The DCNT models metadata via a flexible, Gaussian regression on arbitrary input features; correlations via a scalable square-root covariance representation; and nonparametric selection from an unbounded series of potential topics via a stick-breaking construction. [sent-9, score-0.773]
5 We validate the semantic structure and predictive performance of the DCNT using a corpus of NIPS documents annotated by various metadata. [sent-10, score-0.26]
6 Probabilistic topic models represent documents via a mixture of topics, which are themselves distributions on the discrete vocabulary of the corpus. [sent-12, score-0.509]
7 Latent Dirichlet allocation (LDA) [3] was the first hierarchical Bayesian topic model, and remains influential and widely used. [sent-13, score-0.418]
8 The first assumption springs from LDA’s Dirichlet prior, which implicitly neglects correlations1 in document-specific topic usage. [sent-15, score-0.396]
9 In diverse corpora, true semantic topics may exhibit strong (positive or negative) correlations; neglecting these dependencies may distort the inferred topic structure. [sent-16, score-0.698]
10 The correlated topic model (CTM) [2] uses a logistic-normal prior to express correlations via a latent Gaussian distribution. [sent-17, score-0.611]
11 The second assumption is that each document is represented solely by an unordered “bag of words”. [sent-19, score-0.086]
12 However, text data is often accompanied by a rich set of metadata such as author names, publication dates, relevant keywords, etc. [sent-20, score-0.382]
13 Topics that are consistent with such metadata may also be more semantically relevant. [sent-21, score-0.282]
14 The Dirichlet multinomial regression (DMR) [11] model conditions LDA’s Dirichlet parameters on feature-dependent linear regressions; this allows metadata-specific topic frequencies but retains other limitations of the Dirichlet. [sent-22, score-0.461]
15 Recently, the Gaussian process topic model [1] incorporated correlations at the topic level via a topic covariance, and the document level via an appropriate GP kernel function. [sent-23, score-1.359]
16 The most direct nonparametric extension of LDA is the hierarchical Dirichlet process (HDP) [17]. [sent-28, score-0.106]
17 The HDP allows an unbounded set of topics via a latent stochastic process, but nevertheless imposes a Dirichlet distribution on any finite subset of these topics. [sent-29, score-0.326]
18 Alternatively, the nonparametric Bayes pachinko allocation [9] model captures correlations within an unbounded topic collection via an inferred, directed acyclic graph. [sent-30, score-0.647]
19 More recently, the discrete infinite logistic normal [13] (DILN) model of topic correlations used an exponentiated Gaussian process (GP) to rescale the HDP. [sent-31, score-0.527]
20 This choice leads to arguably simpler learning algorithms, and also facilitates our modeling of document metadata. [sent-34, score-0.086]
21 In this paper, we develop a doubly correlated nonparametric topic (DCNT) model which captures between-topic correlations, as well as between-document correlations induced by metadata, for an unbounded set of potential topics. [sent-35, score-0.734]
22 2, the global soft-max transformation of the DMR and CTM is replaced by a stick-breaking transformation, with inputs determined via both metadata-dependent linear regressions and a square-root covariance representation. [sent-37, score-0.086]
23 Together, these choices lead to a well-posed nonparametric model which allows tractable MCMC learning and inference (Sec. [sent-38, score-0.084]
24 4, we validate the model using a toy dataset, as well as a corpus of NIPS documents annotated by author and year of publication. [sent-41, score-0.445]
25 Let φd ∈ RF denote a feature vector capturing the metadata associated with document d, and φ an F × D matrix of corpus metadata. [sent-47, score-0.459]
26 For each of an unbounded sequence of topics k, let ηf k ∈ R denote an associated significance weight for feature f , and η:k ∈ RF a vector of these weights. [sent-49, score-0.294]
27 In a hierarchical Bayesian fashion [6], these parameters have priors µf ∼ N (0, γµ ), λf ∼ Gam(af , bf ). [sent-51, score-0.107]
28 Appropriate values for the hyperparameters γµ , af , and bf are discussed later. [sent-52, score-0.117]
29 T Given η and φd , the document-specific “score” for topic k is sampled as ukd ∼ N (η:k φd , 1). [sent-53, score-0.396]
30 These real-valued scores are mapped to document-specific topic frequencies πkd in subsequent sections. [sent-54, score-0.496]
31 2 Topic Correlations For topic k in the ordered sequence of topics, we define a sequence of k linear transformation weights Ak , = 1, . [sent-56, score-0.434]
32 We then sample a variable vkd as follows: k Ak u d , λ−1 v vkd ∼ N (1) =1 Let A denote a lower triangular matrix containing these values Ak , padded by zeros. [sent-60, score-0.32]
33 Critically, note that the distribution of vkd depends only on the first k entries of u:d , not the infinite tail of scores for subsequent topics. [sent-62, score-0.179]
34 Our integration of input metadata has close connections to the semiparametric latent factor model [18], but we replace their kernel-based GP covariance representation with a feature-based regression. [sent-65, score-0.366]
35 2 Figure 1: Directed graphical representation of a DCNT model for D documents containing N words. [sent-67, score-0.088]
36 Each of the unbounded set of topics has a word distribution Ωk . [sent-68, score-0.367]
37 The topic assignment zdn for word wdn depends on document-specific topic frequencies πd , which have a correlated dependence on the metadata φd produced by A and η. [sent-69, score-1.466]
38 Given similar lower triangular representations of factorized covariance matrices, conventional Bayesian factor analysis models place a symmetric Gaussian prior Ak ∼ N (0, λ−1 ). [sent-71, score-0.096]
39 A If we constrain A to be a diagonal matrix, with Akk ∼ N (0, λ−1 ) and Ak = 0 for k = , we A recover a simplified singly correlated nonparametric topic (SCNT) model which captures metadata but not topic correlations. [sent-76, score-1.252]
40 For either model, the precision parameters are assigned conjugate gamma priors λv ∼ Gam(av , bv ), λA ∼ Gam(aA , bA ). [sent-77, score-0.166]
41 Let πkd be the probability of choosing topic k in ∞ document d, where k=1 πkd = 1. [sent-80, score-0.482]
42 This same transformation is part of the so-called logistic stick-breaking process [14], but that model is motivated by different applications, and thus employs a very different prior distribution for vkd . [sent-83, score-0.264]
43 Given the distribution π:d , the topic assignment indicator for word n in document d is drawn according to zdn ∼ Mult(π:d ). [sent-84, score-0.715]
44 Finally, wdn ∼ Mult(Ωzdn ) where Ωk ∼ Dir(β) is the word distribution for topic k, sampled from a Dirichlet prior with symmetric hyperparameters β. [sent-85, score-0.561]
45 Due to the logistic stick-breaking transformation, closed form resampling of v is intractable; we instead use a Metropolis independence sampler [6]. [sent-88, score-0.107]
46 As our experiments demonstrate, K is not the number of topics that will be utilized by the learned model, but rather a ¯ (possibly loose) upper bound on that number. [sent-91, score-0.24]
47 The probabilities πkd for the first K topics are set as in eq. [sent-95, score-0.24]
48 (2), with the ¯ v K−1 K−1 final topic set so that a valid distribution is ensured: πKd = 1 − k=1 πkd = k=1 ψ(−vkd ). [sent-96, score-0.396]
49 As in many regression models, the gamma prior is conjugate so that ¯ K N (ηf k | µf , λ−1 ) f p(λf | η, af , bf ) ∝ Gam(λf | af , bf ) k=1 1¯ 1 ∝ Gam λf | K + af , 2 2 ¯ K (ηf k − µf )2 + bf . [sent-99, score-0.379]
50 (3) k=1 Similarly, the precision parameter λv has a gamma prior and posterior: D N (v:d | Au:d , L−1 ) p(λv | v, av , bv ) ∝ Gam(λv | av , bv ) d=1 ∝ Gam λv | 1 1¯ KD + av , 2 2 D (v:d − Au:d )T (v:d − Au:d ) + bv . [sent-100, score-0.391]
51 With a gamma prior, the precision parameter λA nevertheless has the following gamma posterior: ¯ K k N (Ak | 0, (kλA )−1 ) p(λA | A, aA , bA ) ∝ Gam(λA | aA , bA ) k=1 =1 ∝ Gam λA | 1¯ ¯ 1 K(K − 1) + aA , 2 2 ¯ K k kA2 + bA . [sent-102, score-0.161]
52 k: (7) Similarly, the scores u:d for each document are conditionally independent with Gaussian posteriors: p(u:d | v:d , η, φd , L) ∝ N (u:d | η T φd , IK )N (v:d | Au:d , L−1 ) ¯ ∝ N (u:d | (IK + AT LA)−1 (AT Lv:d + η T φd ), (IK + AT LA)−1 ). [sent-105, score-0.147]
53 Let Mkw denote the number of instances of word w assigned to topic k, \dn excluding token n in document d, and Mk. [sent-110, score-0.555]
54 For a vocabulary with W unique word types, the posterior distribution of topic indicator zdn is then \dn p(zdn = k | π:d , z\dn ) ∝ πkd Mkw + β \dn Mk. [sent-112, score-0.678]
55 (10) + Wβ Recall that the topic probabilities π:d are determined from v:d via Equation (2). [sent-114, score-0.396]
56 1 Experimental Results Toy Bars Dataset Following related validations of the LDA model [7], we ran experiments on a toy corpus of “images” designed to validate the features of the DCNT. [sent-120, score-0.256]
57 Ten topics were defined, corresponding to all possible horizontal and vertical 5-pixel “bars”. [sent-124, score-0.24]
58 In the first, a random number of topics is chosen for each document, and then a corresponding subset of the bars is picked uniformly at random. [sent-126, score-0.286]
59 In the second, we induce topic correlations by generating documents that contain a combination of either only horizontal (topics 1-5) or only vertical (topics 6-10) bars. [sent-127, score-0.569]
60 Using these toy datasets, we compared the LDA model to several versions of the DCNT. [sent-129, score-0.117]
61 For LDA, we set the number of topics to the true value of K = 10. [sent-130, score-0.24]
62 Similar to previous toy experiments [7], we set the parameters of its Dirichlet prior over topic distributions to α = 50/K, and the topic smoothing parameter to β = 0. [sent-131, score-0.945]
63 For the DCNT model, we set γµ = 106 , and all gamma prior hyperparameters as a = b = 0. [sent-133, score-0.124]
64 We compared three variants of the DCNT model: the singly correlated SCNT (A constrained to be diagonal) with K = 10, the DCNT with K = 10, and the DCNT with K = 20. [sent-136, score-0.094]
65 For the toy dataset with correlated topics, the results of running all sampling algorithms for 10,000 iterations are illustrated in Figure 2. [sent-138, score-0.179]
66 On this relatively clean data, all models limited to K = 10 5 Figure 2: A dataset of correlated toy bars (example document images in bottom left). [sent-139, score-0.311]
67 Note that the true topic order is not identifiable. [sent-141, score-0.396]
68 Bottom: Inferred topic covariance matrices for the four corresponding models. [sent-142, score-0.424]
69 Note that LDA assumes all topics have a slight negative correlation, while the DCNT infers more pronounced positive correlations. [sent-143, score-0.26]
70 To determine the topic correlations corresponding to a set of learned model parameters, we use a Monte Carlo estimate (details in the supplemental material). [sent-149, score-0.504]
71 To make these matrices easier to visualize, the Hungarian algorithm was used to reorder topic labels for best alignment with the ground truth topic assignments. [sent-150, score-0.792]
72 Note the significant blocks of positive correlations recovered by the DCNT, reflecting the true correlations used to create this toy data. [sent-151, score-0.287]
73 2 NIPS Corpus The NIPS corpus that we used consisted of publications from previous NIPS conferences 0-12 (1987-1999), including various metadata (year of publication, authors, and section categories). [sent-153, score-0.464]
74 1 Conditioning on Metadata A learned DCNT model provides predictions for how topic frequencies change given particular metadata associated with a document. [sent-161, score-0.743]
75 In Figure 3, we show how predicted topic frequencies change over time, conditioning also on one of three authors (Michael Jordan, Geoffrey Hinton, or Terrence Sejnowski). [sent-162, score-0.528]
76 For each, words from a relevant topic illustrate how conditioning on a particular author can change the predicted document content. [sent-163, score-0.592]
77 For example, the visualization associated with Michael Jordan shows that the frequency of the topic associated with probabilistic models gradually increases over the years, while the topic associated with neural networks decreases. [sent-164, score-0.792]
78 Conditioning on Geoffrey Hinton puts larger mass on a topic which focuses on models developed by his research group. [sent-165, score-0.396]
79 Finally, conditioning on Terrence Sejnowski dramatically increases the probability of topics related to neuroscience. [sent-166, score-0.284]
80 2 Correlations between Topics The DCNT model can also capture correlations between topics. [sent-169, score-0.085]
81 The middle row illustrates the word distributions for the topics highlighted by red dots in their respective columns. [sent-173, score-0.313]
82 Figure 4: A Hinton diagram of correlations between all pairs of topics, where the sizes of squares indicates the magnitude of dependence, and red and blue squares indicate positive and negative correlations, respectively. [sent-175, score-0.085]
83 To the right are the top six words from three strongly correlated topic pairs. [sent-176, score-0.48]
84 We can see that the model learned strong positive correlations between function and learning topics which have strong semantic similarities, but are not identical. [sent-183, score-0.359]
85 Another positive correlation that the model discovered was between the topics visual and neuron; of course there are many papers at NIPS which study the brain’s visual cortex. [sent-184, score-0.24]
86 3 Predictive Likelihood In order to quantitatively measure the generalization power of our DCNT model, we tested several variants on two versions of the toy bars dataset (correlated & uncorrelated). [sent-187, score-0.163]
87 We also compared models on the NIPS corpus, to explore more realistic data where metadata is available. [sent-188, score-0.282]
88 The test data for the toy dataset consisted of 500 documents generated by the same process as the training data, 7 Perplexity (Toy Data) Perplexity scores (NIPS) 14 2100 12 2050 10 2000 8 1950 6 1900 4 1850 2 10. [sent-189, score-0.265]
89 26 LDA HDP DCNT−noF DCNT−Y DCNT−YA1 DCNT−YA2 Figure 5: Perplexity scores (lower is better) computed via Chib-style estimators for several topic models. [sent-203, score-0.431]
90 Left: Test performance for the toy datasets with uncorrelated bars (-A) and correlated bars (-B). [sent-204, score-0.271]
91 Right: Test performance on the NIPS corpus with various metadata: no features (-noF), year features (-Y), year and prolific author features (over 10 publications, -YA1), and year and additional author features (over 5 publications, -YA2). [sent-205, score-0.539]
92 while the NIPS corpus was split into training and tests subsets containing 80% and 20% of the full corpus, respectively. [sent-206, score-0.091]
93 Predictive negative log-likelihood estimates were normalized by word counts to determine perplexity scores [3]. [sent-211, score-0.19]
94 For the toy bars data, we set the number of topics to K = 10 for all models except the HDP, which learned K = 15. [sent-214, score-0.403]
95 For the toy datasets, the LDA and HDP models perform similarly. [sent-216, score-0.117]
96 The SCNT and DCNT are both superior, apparently due to their ability to capture non-Dirichlet distributions on topic occurrence patterns. [sent-217, score-0.396]
97 Including metadata encoding the year of publication, and possibly also the most prolific authors, provides slight additional improvements in DCNT accuracy. [sent-219, score-0.386]
98 While it is pleasing that the DCNT and SCNT models seem to provide improved predictive likelihoods, a recent study on the human interpretability of topic models showed that such scores do not necessarily correlate with more meaningful semantic structures [4]. [sent-222, score-0.491]
99 5 Discussion The doubly correlated nonparametric topic model flexibly allows the incorporation of arbitrary features associated with documents, captures correlations that might exist within a dataset’s latent topics, and can learn an unbounded set of topics. [sent-226, score-0.793]
100 The model uses a set of efficient MCMC techniques for learning and inference, and is supported by a set of web-based tools that allow users to visualize the inferred semantic structure. [sent-227, score-0.086]
wordName wordTfidf (topN-words)
[('dcnt', 0.594), ('topic', 0.396), ('metadata', 0.282), ('topics', 0.24), ('zdn', 0.16), ('scnt', 0.144), ('vkd', 0.144), ('lda', 0.142), ('toy', 0.117), ('gam', 0.117), ('kd', 0.112), ('hdp', 0.104), ('corpus', 0.091), ('documents', 0.088), ('document', 0.086), ('correlations', 0.085), ('year', 0.084), ('nonparametric', 0.084), ('au', 0.081), ('dirichlet', 0.078), ('word', 0.073), ('ak', 0.068), ('publications', 0.066), ('bf', 0.065), ('frequencies', 0.065), ('gamma', 0.064), ('correlated', 0.062), ('uk', 0.059), ('ik', 0.059), ('perplexity', 0.056), ('proli', 0.056), ('publication', 0.056), ('unbounded', 0.054), ('doubly', 0.053), ('bv', 0.049), ('terrence', 0.048), ('logistic', 0.046), ('bars', 0.046), ('conditioning', 0.044), ('author', 0.044), ('ba', 0.042), ('aa', 0.041), ('sampler', 0.041), ('dn', 0.038), ('transformation', 0.038), ('av', 0.037), ('prior', 0.036), ('scores', 0.035), ('mcmc', 0.035), ('semantic', 0.034), ('sejnowski', 0.033), ('precision', 0.033), ('ctm', 0.032), ('dae', 0.032), ('dmr', 0.032), ('mkw', 0.032), ('singly', 0.032), ('wdn', 0.032), ('triangular', 0.032), ('latent', 0.032), ('vk', 0.031), ('geoffrey', 0.03), ('covariance', 0.028), ('nips', 0.028), ('pachinko', 0.028), ('iarpa', 0.028), ('hinton', 0.028), ('rf', 0.028), ('inferred', 0.028), ('af', 0.028), ('features', 0.027), ('counts', 0.026), ('predictive', 0.026), ('conditionally', 0.026), ('tokens', 0.026), ('bayesian', 0.026), ('vocabulary', 0.025), ('consisted', 0.025), ('semiparametric', 0.024), ('mult', 0.024), ('afrl', 0.024), ('visualize', 0.024), ('hyperparameters', 0.024), ('gp', 0.024), ('posterior', 0.024), ('supplemental', 0.023), ('authors', 0.023), ('jordan', 0.023), ('blei', 0.022), ('words', 0.022), ('hierarchical', 0.022), ('metropolis', 0.021), ('kk', 0.021), ('validate', 0.021), ('resampling', 0.02), ('priors', 0.02), ('slight', 0.02), ('regressions', 0.02), ('il', 0.02), ('gaussian', 0.02)]
simIndex simValue paperId paperTitle
same-paper 1 0.99999988 281 nips-2011-The Doubly Correlated Nonparametric Topic Model
Author: Dae I. Kim, Erik B. Sudderth
Abstract: Topic models are learned via a statistical model of variation within document collections, but designed to extract meaningful semantic structure. Desirable traits include the ability to incorporate annotations or metadata associated with documents; the discovery of correlated patterns of topic usage; and the avoidance of parametric assumptions, such as manual specification of the number of topics. We propose a doubly correlated nonparametric topic (DCNT) model, the first model to simultaneously capture all three of these properties. The DCNT models metadata via a flexible, Gaussian regression on arbitrary input features; correlations via a scalable square-root covariance representation; and nonparametric selection from an unbounded series of potential topics via a stick-breaking construction. We validate the semantic structure and predictive performance of the DCNT using a corpus of NIPS documents annotated by various metadata. 1
2 0.39985961 58 nips-2011-Complexity of Inference in Latent Dirichlet Allocation
Author: David Sontag, Dan Roy
Abstract: We consider the computational complexity of probabilistic inference in Latent Dirichlet Allocation (LDA). First, we study the problem of finding the maximum a posteriori (MAP) assignment of topics to words, where the document’s topic distribution is integrated out. We show that, when the e↵ective number of topics per document is small, exact inference takes polynomial time. In contrast, we show that, when a document has a large number of topics, finding the MAP assignment of topics to words in LDA is NP-hard. Next, we consider the problem of finding the MAP topic distribution for a document, where the topic-word assignments are integrated out. We show that this problem is also NP-hard. Finally, we briefly discuss the problem of sampling from the posterior, showing that this is NP-hard in one restricted setting, but leaving open the general question. 1
3 0.31940514 129 nips-2011-Improving Topic Coherence with Regularized Topic Models
Author: David Newman, Edwin V. Bonilla, Wray Buntine
Abstract: Topic models have the potential to improve search and browsing by extracting useful semantic themes from web pages and other text documents. When learned topics are coherent and interpretable, they can be valuable for faceted browsing, results set diversity analysis, and document retrieval. However, when dealing with small collections or noisy text (e.g. web search result snippets or blog posts), learned topics can be less coherent, less interpretable, and less useful. To overcome this, we propose two methods to regularize the learning of topic models. Our regularizers work by creating a structured prior over words that reflect broad patterns in the external data. Using thirteen datasets we show that both regularizers improve topic coherence and interpretability while learning a faithful representation of the collection of interest. Overall, this work makes topic models more useful across a broader range of text data. 1
4 0.20014231 115 nips-2011-Hierarchical Topic Modeling for Analysis of Time-Evolving Personal Choices
Author: Xianxing Zhang, Lawrence Carin, David B. Dunson
Abstract: The nested Chinese restaurant process is extended to design a nonparametric topic-model tree for representation of human choices. Each tree path corresponds to a type of person, and each node (topic) has a corresponding probability vector over items that may be selected. The observed data are assumed to have associated temporal covariates (corresponding to the time at which choices are made), and we wish to impose that with increasing time it is more probable that topics deeper in the tree are utilized. This structure is imposed by developing a new “change point
5 0.1738898 110 nips-2011-Group Anomaly Detection using Flexible Genre Models
Author: Liang Xiong, Barnabás Póczos, Jeff G. Schneider
Abstract: An important task in exploring and analyzing real-world data sets is to detect unusual and interesting phenomena. In this paper, we study the group anomaly detection problem. Unlike traditional anomaly detection research that focuses on data points, our goal is to discover anomalous aggregated behaviors of groups of points. For this purpose, we propose the Flexible Genre Model (FGM). FGM is designed to characterize data groups at both the point level and the group level so as to detect various types of group anomalies. We evaluate the effectiveness of FGM on both synthetic and real data sets including images and turbulence data, and show that it is superior to existing approaches in detecting group anomalies. 1
6 0.15170211 116 nips-2011-Hierarchically Supervised Latent Dirichlet Allocation
7 0.14863013 156 nips-2011-Learning to Learn with Compound HD Models
8 0.07332164 258 nips-2011-Sparse Bayesian Multi-Task Learning
9 0.067968123 134 nips-2011-Infinite Latent SVM for Classification and Multi-task Learning
10 0.067093849 221 nips-2011-Priors over Recurrent Continuous Time Processes
11 0.064925179 14 nips-2011-A concave regularization technique for sparse mixture models
12 0.061731704 26 nips-2011-Additive Gaussian Processes
13 0.052976459 200 nips-2011-On the Analysis of Multi-Channel Neural Spike Data
14 0.051938415 104 nips-2011-Generalized Beta Mixtures of Gaussians
15 0.051184922 176 nips-2011-Multi-View Learning of Word Embeddings via CCA
16 0.047707379 191 nips-2011-Nonnegative dictionary learning in the exponential noise model for adaptive music signal representation
17 0.046343669 160 nips-2011-Linear Submodular Bandits and their Application to Diversified Retrieval
18 0.044020344 142 nips-2011-Large-Scale Sparse Principal Component Analysis with Application to Text Data
19 0.043769393 301 nips-2011-Variational Gaussian Process Dynamical Systems
20 0.041575085 4 nips-2011-A Convergence Analysis of Log-Linear Training
topicId topicWeight
[(0, 0.136), (1, 0.072), (2, 0.001), (3, 0.009), (4, -0.051), (5, -0.466), (6, 0.183), (7, 0.149), (8, -0.199), (9, 0.107), (10, 0.123), (11, 0.15), (12, -0.005), (13, 0.03), (14, 0.036), (15, -0.008), (16, 0.071), (17, 0.028), (18, -0.058), (19, 0.036), (20, -0.104), (21, 0.013), (22, 0.031), (23, -0.003), (24, 0.009), (25, 0.019), (26, -0.029), (27, -0.036), (28, 0.002), (29, -0.05), (30, -0.019), (31, -0.004), (32, 0.025), (33, -0.019), (34, -0.02), (35, -0.012), (36, 0.0), (37, -0.006), (38, 0.003), (39, 0.005), (40, 0.012), (41, 0.022), (42, -0.02), (43, -0.023), (44, -0.015), (45, -0.008), (46, -0.021), (47, -0.055), (48, -0.015), (49, -0.038)]
simIndex simValue paperId paperTitle
same-paper 1 0.97057593 281 nips-2011-The Doubly Correlated Nonparametric Topic Model
Author: Dae I. Kim, Erik B. Sudderth
Abstract: Topic models are learned via a statistical model of variation within document collections, but designed to extract meaningful semantic structure. Desirable traits include the ability to incorporate annotations or metadata associated with documents; the discovery of correlated patterns of topic usage; and the avoidance of parametric assumptions, such as manual specification of the number of topics. We propose a doubly correlated nonparametric topic (DCNT) model, the first model to simultaneously capture all three of these properties. The DCNT models metadata via a flexible, Gaussian regression on arbitrary input features; correlations via a scalable square-root covariance representation; and nonparametric selection from an unbounded series of potential topics via a stick-breaking construction. We validate the semantic structure and predictive performance of the DCNT using a corpus of NIPS documents annotated by various metadata. 1
2 0.96553713 129 nips-2011-Improving Topic Coherence with Regularized Topic Models
Author: David Newman, Edwin V. Bonilla, Wray Buntine
Abstract: Topic models have the potential to improve search and browsing by extracting useful semantic themes from web pages and other text documents. When learned topics are coherent and interpretable, they can be valuable for faceted browsing, results set diversity analysis, and document retrieval. However, when dealing with small collections or noisy text (e.g. web search result snippets or blog posts), learned topics can be less coherent, less interpretable, and less useful. To overcome this, we propose two methods to regularize the learning of topic models. Our regularizers work by creating a structured prior over words that reflect broad patterns in the external data. Using thirteen datasets we show that both regularizers improve topic coherence and interpretability while learning a faithful representation of the collection of interest. Overall, this work makes topic models more useful across a broader range of text data. 1
3 0.92480803 58 nips-2011-Complexity of Inference in Latent Dirichlet Allocation
Author: David Sontag, Dan Roy
Abstract: We consider the computational complexity of probabilistic inference in Latent Dirichlet Allocation (LDA). First, we study the problem of finding the maximum a posteriori (MAP) assignment of topics to words, where the document’s topic distribution is integrated out. We show that, when the e↵ective number of topics per document is small, exact inference takes polynomial time. In contrast, we show that, when a document has a large number of topics, finding the MAP assignment of topics to words in LDA is NP-hard. Next, we consider the problem of finding the MAP topic distribution for a document, where the topic-word assignments are integrated out. We show that this problem is also NP-hard. Finally, we briefly discuss the problem of sampling from the posterior, showing that this is NP-hard in one restricted setting, but leaving open the general question. 1
4 0.7989592 110 nips-2011-Group Anomaly Detection using Flexible Genre Models
Author: Liang Xiong, Barnabás Póczos, Jeff G. Schneider
Abstract: An important task in exploring and analyzing real-world data sets is to detect unusual and interesting phenomena. In this paper, we study the group anomaly detection problem. Unlike traditional anomaly detection research that focuses on data points, our goal is to discover anomalous aggregated behaviors of groups of points. For this purpose, we propose the Flexible Genre Model (FGM). FGM is designed to characterize data groups at both the point level and the group level so as to detect various types of group anomalies. We evaluate the effectiveness of FGM on both synthetic and real data sets including images and turbulence data, and show that it is superior to existing approaches in detecting group anomalies. 1
5 0.70423037 115 nips-2011-Hierarchical Topic Modeling for Analysis of Time-Evolving Personal Choices
Author: Xianxing Zhang, Lawrence Carin, David B. Dunson
Abstract: The nested Chinese restaurant process is extended to design a nonparametric topic-model tree for representation of human choices. Each tree path corresponds to a type of person, and each node (topic) has a corresponding probability vector over items that may be selected. The observed data are assumed to have associated temporal covariates (corresponding to the time at which choices are made), and we wish to impose that with increasing time it is more probable that topics deeper in the tree are utilized. This structure is imposed by developing a new “change point
6 0.69828767 116 nips-2011-Hierarchically Supervised Latent Dirichlet Allocation
7 0.54799438 14 nips-2011-A concave regularization technique for sparse mixture models
8 0.47034153 156 nips-2011-Learning to Learn with Compound HD Models
9 0.36804557 160 nips-2011-Linear Submodular Bandits and their Application to Diversified Retrieval
11 0.26177919 176 nips-2011-Multi-View Learning of Word Embeddings via CCA
12 0.25664991 221 nips-2011-Priors over Recurrent Continuous Time Processes
13 0.24604475 191 nips-2011-Nonnegative dictionary learning in the exponential noise model for adaptive music signal representation
14 0.24198075 131 nips-2011-Inference in continuous-time change-point models
15 0.2331681 42 nips-2011-Bayesian Bias Mitigation for Crowdsourcing
16 0.23172469 216 nips-2011-Portmanteau Vocabularies for Multi-Cue Image Representation
17 0.22851843 104 nips-2011-Generalized Beta Mixtures of Gaussians
18 0.21376584 101 nips-2011-Gaussian process modulated renewal processes
19 0.21115416 192 nips-2011-Nonstandard Interpretations of Probabilistic Programs for Efficient Inference
20 0.20905736 134 nips-2011-Infinite Latent SVM for Classification and Multi-task Learning
topicId topicWeight
[(0, 0.038), (4, 0.029), (8, 0.129), (20, 0.04), (21, 0.071), (26, 0.021), (31, 0.082), (33, 0.019), (39, 0.015), (43, 0.087), (45, 0.075), (49, 0.05), (57, 0.048), (65, 0.017), (74, 0.061), (83, 0.023), (84, 0.045), (99, 0.035)]
simIndex simValue paperId paperTitle
same-paper 1 0.86311203 281 nips-2011-The Doubly Correlated Nonparametric Topic Model
Author: Dae I. Kim, Erik B. Sudderth
Abstract: Topic models are learned via a statistical model of variation within document collections, but designed to extract meaningful semantic structure. Desirable traits include the ability to incorporate annotations or metadata associated with documents; the discovery of correlated patterns of topic usage; and the avoidance of parametric assumptions, such as manual specification of the number of topics. We propose a doubly correlated nonparametric topic (DCNT) model, the first model to simultaneously capture all three of these properties. The DCNT models metadata via a flexible, Gaussian regression on arbitrary input features; correlations via a scalable square-root covariance representation; and nonparametric selection from an unbounded series of potential topics via a stick-breaking construction. We validate the semantic structure and predictive performance of the DCNT using a corpus of NIPS documents annotated by various metadata. 1
2 0.75132841 31 nips-2011-An Application of Tree-Structured Expectation Propagation for Channel Decoding
Author: Pablo M. Olmos, Luis Salamanca, Juan Fuentes, Fernando Pérez-Cruz
Abstract: We show an application of a tree structure for approximate inference in graphical models using the expectation propagation algorithm. These approximations are typically used over graphs with short-range cycles. We demonstrate that these approximations also help in sparse graphs with long-range loops, as the ones used in coding theory to approach channel capacity. For asymptotically large sparse graph, the expectation propagation algorithm together with the tree structure yields a completely disconnected approximation to the graphical model but, for for finite-length practical sparse graphs, the tree structure approximation to the code graph provides accurate estimates for the marginal of each variable. Furthermore, we propose a new method for constructing the tree structure on the fly that might be more amenable for sparse graphs with general factors. 1
3 0.73364574 58 nips-2011-Complexity of Inference in Latent Dirichlet Allocation
Author: David Sontag, Dan Roy
Abstract: We consider the computational complexity of probabilistic inference in Latent Dirichlet Allocation (LDA). First, we study the problem of finding the maximum a posteriori (MAP) assignment of topics to words, where the document’s topic distribution is integrated out. We show that, when the e↵ective number of topics per document is small, exact inference takes polynomial time. In contrast, we show that, when a document has a large number of topics, finding the MAP assignment of topics to words in LDA is NP-hard. Next, we consider the problem of finding the MAP topic distribution for a document, where the topic-word assignments are integrated out. We show that this problem is also NP-hard. Finally, we briefly discuss the problem of sampling from the posterior, showing that this is NP-hard in one restricted setting, but leaving open the general question. 1
4 0.71949285 273 nips-2011-Structural equations and divisive normalization for energy-dependent component analysis
Author: Jun-ichiro Hirayama, Aapo Hyvärinen
Abstract: Components estimated by independent component analysis and related methods are typically not independent in real data. A very common form of nonlinear dependency between the components is correlations in their variances or energies. Here, we propose a principled probabilistic model to model the energycorrelations between the latent variables. Our two-stage model includes a linear mixing of latent signals into the observed ones like in ICA. The main new feature is a model of the energy-correlations based on the structural equation model (SEM), in particular, a Linear Non-Gaussian SEM. The SEM is closely related to divisive normalization which effectively reduces energy correlation. Our new twostage model enables estimation of both the linear mixing and the interactions related to energy-correlations, without resorting to approximations of the likelihood function or other non-principled approaches. We demonstrate the applicability of our method with synthetic dataset, natural images and brain signals. 1
5 0.71773499 116 nips-2011-Hierarchically Supervised Latent Dirichlet Allocation
Author: Adler J. Perotte, Frank Wood, Noemie Elhadad, Nicholas Bartlett
Abstract: We introduce hierarchically supervised latent Dirichlet allocation (HSLDA), a model for hierarchically and multiply labeled bag-of-word data. Examples of such data include web pages and their placement in directories, product descriptions and associated categories from product hierarchies, and free-text clinical records and their assigned diagnosis codes. Out-of-sample label prediction is the primary goal of this work, but improved lower-dimensional representations of the bagof-word data are also of interest. We demonstrate HSLDA on large-scale data from clinical document labeling and retail product categorization tasks. We show that leveraging the structure from hierarchical labels improves out-of-sample label prediction substantially when compared to models that do not. 1
6 0.71225572 112 nips-2011-Heavy-tailed Distances for Gradient Based Image Descriptors
7 0.7111274 258 nips-2011-Sparse Bayesian Multi-Task Learning
8 0.70930725 82 nips-2011-Efficient coding of natural images with a population of noisy Linear-Nonlinear neurons
9 0.70340776 235 nips-2011-Recovering Intrinsic Images with a Global Sparsity Prior on Reflectance
10 0.70324987 183 nips-2011-Neural Reconstruction with Approximate Message Passing (NeuRAMP)
11 0.70286506 285 nips-2011-The Kernel Beta Process
12 0.70274431 276 nips-2011-Structured sparse coding via lateral inhibition
13 0.70224416 301 nips-2011-Variational Gaussian Process Dynamical Systems
14 0.70219141 92 nips-2011-Expressive Power and Approximation Errors of Restricted Boltzmann Machines
15 0.70031142 200 nips-2011-On the Analysis of Multi-Channel Neural Spike Data
16 0.69780326 140 nips-2011-Kernel Embeddings of Latent Tree Graphical Models
17 0.69773448 156 nips-2011-Learning to Learn with Compound HD Models
18 0.69755149 43 nips-2011-Bayesian Partitioning of Large-Scale Distance Data
19 0.69730115 57 nips-2011-Comparative Analysis of Viterbi Training and Maximum Likelihood Estimation for HMMs
20 0.69699109 135 nips-2011-Information Rates and Optimal Decoding in Large Neural Populations