nips nips2011 nips2011-58 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: David Sontag, Dan Roy
Abstract: We consider the computational complexity of probabilistic inference in Latent Dirichlet Allocation (LDA). First, we study the problem of finding the maximum a posteriori (MAP) assignment of topics to words, where the document’s topic distribution is integrated out. We show that, when the e↵ective number of topics per document is small, exact inference takes polynomial time. In contrast, we show that, when a document has a large number of topics, finding the MAP assignment of topics to words in LDA is NP-hard. Next, we consider the problem of finding the MAP topic distribution for a document, where the topic-word assignments are integrated out. We show that this problem is also NP-hard. Finally, we briefly discuss the problem of sampling from the posterior, showing that this is NP-hard in one restricted setting, but leaving open the general question. 1
Reference: text
sentIndex sentText sentNum sentScore
1 Roy University of Cambridge Abstract We consider the computational complexity of probabilistic inference in Latent Dirichlet Allocation (LDA). [sent-2, score-0.205]
2 First, we study the problem of finding the maximum a posteriori (MAP) assignment of topics to words, where the document’s topic distribution is integrated out. [sent-3, score-1.158]
3 We show that, when the e↵ective number of topics per document is small, exact inference takes polynomial time. [sent-4, score-0.885]
4 In contrast, we show that, when a document has a large number of topics, finding the MAP assignment of topics to words in LDA is NP-hard. [sent-5, score-0.927]
5 Next, we consider the problem of finding the MAP topic distribution for a document, where the topic-word assignments are integrated out. [sent-6, score-0.563]
6 1 Introduction Probabilistic models of text and topics, known as topic models, are powerful tools for exploring large data sets and for making inferences about the content of documents. [sent-9, score-0.476]
7 Topic models are frequently used for deriving low-dimensional representations of documents that are then used for information retrieval, document summarization, and classification [Blei and McAuli↵e, 2008; Lacoste-Julien et al. [sent-10, score-0.222]
8 In this paper, we consider the computational complexity of inference in topic models, beginning with one of the simplest and most popular models, Latent Dirichlet Allocation (LDA) [Blei et al. [sent-12, score-0.65]
9 Almost all uses of topic models require probabilistic inference. [sent-15, score-0.507]
10 For example, unsupervised learning of topic models using Expectation Maximization requires the repeated computation of marginal probabilities of what topics are present in the documents. [sent-16, score-0.932]
11 For applications in information retrieval and classification, each new document necessitates inference to determine what topics are present. [sent-17, score-0.84]
12 Although there is a wealth of literature on approximate inference algorithms for topic models, such Gibbs sampling and variational inference [Blei et al. [sent-18, score-0.793]
13 Furthermore, the existing inference algorithms, although well-motivated, do not provide guarantees of optimality. [sent-22, score-0.133]
14 We choose to study LDA because we believe that it captures the essence of what makes inference easy or hard in topic models. [sent-23, score-0.689]
15 We believe that a careful analysis of the complexity of popular probabilistic models like LDA will ultimately help us build a methodology for spanning the gap between theory and practice in probabilistic AI. [sent-24, score-0.173]
16 Our hope is that our results will motivate discussion of the following questions, guiding research of both new topic models and the design of new approximate inference and learning ⇤ This work was partially carried out while D. [sent-25, score-0.609]
17 How do the hyperparameters a↵ect the structure of the posterior and the hardness of inference? [sent-34, score-0.208]
18 We study the complexity of finding assignments of topics to words with high posterior probability and the complexity of summarizing the posterior distributions on topics in a document by either its expectation or points with high posterior density. [sent-35, score-1.584]
19 In the former case, we show that the number of topics in the maximum a posteriori assignment determines the hardness. [sent-36, score-0.634]
20 In the latter case, we quantify the sense in which the Dirichlet prior can be seen to enforce sparsity and use this result to show hardness via a reduction from set cover. [sent-37, score-0.181]
21 2 MAP inference of word assignments We will consider the inference problem for a single document. [sent-38, score-0.528]
22 , wN ), is generated as follows: a distribution over the T topics is sampled from a Dirichlet distribution, ✓ ⇠ Dir(↵); then, for i 2 [N ] := {1, . [sent-42, score-0.48]
23 , N }, we sample a topic zi ⇠ Multinomial(✓) and word wi ⇠ zi , where t , t 2 [T ] are distributions on a dictionary of words. [sent-45, score-0.923]
24 Assume that the word distributions t are fixed (e. [sent-46, score-0.199]
25 , they have been previously estimated), and let lit = log Pr(wi |zi = t) be the log probability of the ith word being generated from topic t. [sent-48, score-0.925]
26 After integrating out the topic distribution vector, the joint distribution of the topic assignments conditioned on the words w is given by Q P N ↵t ) t (nt + ↵t ) Y ( P Pr(wi |zi ), (1) Pr(z1 , . [sent-49, score-1.199]
27 , zN |w) / Q t t (↵t ) ( t ↵t + N ) i=1 where nt is the total number of words assigned to topic t. [sent-52, score-0.785]
28 In this section, we focus on the inference problem of finding the most likely assignment of topics to words, i. [sent-53, score-0.731]
29 More broadly, for many classification tasks involving topic models it may be useful to have word-level features for whether a particular word was assigned to a given topic. [sent-58, score-0.716]
30 1 Exact maximization for small number of topics Suppose a document only uses ⌧ ⌧ T topics. [sent-63, score-0.703]
31 That is, T could be large, but we are guaranteed that the MAP assignment for a document uses at most ⌧ di↵erent topics. [sent-64, score-0.364]
32 In this section, we show how we can use this knowledge to e ciently find a maximizing assignment of words to topics. [sent-65, score-0.249]
33 It is important to note that we only restrict the maximum number of topics per document, letting the Dirichlet prior and the likelihood guide the choice of the actual number of topics present. [sent-66, score-0.968]
34 We first observe that, if we knew the number of words assigned to each topic, finding the MAP assignment is easy. [sent-67, score-0.29]
35 For t 2 [T ], let n⇤ be the number of words assigned to topic t t 2 t1 t2 t3 t4 t5 15 3. [sent-68, score-0.624]
36 The x-axis varies nt , the number of words assigned n to topic t, and the y-axis shows F (nt ). [sent-82, score-0.785]
37 Using weighted bmatching, we can find a MAP assignment of words to topics by trying all T = ⇥ T ⌧ ) ( ⌧ choices of ⌧ topics and all possible valid partitions with at most ⌧ non-zeros. [sent-92, score-1.218]
38 , nT ) such that nt = 0 for t 62 A do P Weighted-B-Matching(A, n, l) + t log ( nt + ↵t ) A,n end for end for return arg maxA,n A,n There are at most N ⌧ 1 valid partitions with ⌧ non-zero counts. [sent-96, score-0.477]
39 For each of these, we solve the b-matching problem to find the most likely assignment of words to topics that satisfies the cardinality constraints. [sent-97, score-0.705]
40 This is polynomial when the number of topics ⌧ appearing in a document is a constant. [sent-99, score-0.728]
41 2 Inference is NP-hard for large numbers of topics In this section, we show that probabilistic inference is NP-hard in the general setting where a document may have a large number of topics in its MAP assignment. [sent-101, score-1.298]
42 Our proof is a straightforward generalization of the approach used by Halperin and Karp [2005] to show that the minimum entropy set cover problem is hard to approximate. [sent-108, score-0.139]
43 This requires specifying the words in the document, the number of topics, and the word log probabilities lit . [sent-114, score-0.481]
44 Let each element i 2 ⌃ correspond to a word wi , and let each set correspond to one topic. [sent-115, score-0.259]
45 The topics are on the top, and the words from the document are on the bottom. [sent-121, score-0.785]
46 An edge is drawn between a topic (set) and a word (element) if the corresponding set contains that element. [sent-122, score-0.675]
47 More realistic documents would have significantly more words than topics used. [sent-134, score-0.563]
48 Although this is not possible while keeping k = 3, since the MAP assignment always has ⌧ N/k, we can instead reduce from a k-set packing problem with k 3. [sent-135, score-0.211]
49 3 MAP inference of the topic distribution In this section we consider the task of finding the mode of Pr(✓|w). [sent-137, score-0.633]
50 This MAP problem involves integrating out the topic assignments, zi , as opposed to the previously considered MAP problem of integrating out the topic distribution ✓. [sent-138, score-1.128]
51 We will see that the MAP topic distribution is not always well-defined, which will lead us to define and study alternative formulations. [sent-139, score-0.524]
52 In particular, we give a precise characterization of the MAP problem as one of finding sparse topic distributions, and use this fact to give hardness results for several settings. [sent-140, score-0.561]
53 There are many potential applications of MAP inference of the document’s topic distribution. [sent-142, score-0.609]
54 Thus, the MAP topic distribution provides a compact summary of the document that could be useful for document summarization. [sent-145, score-0.944]
55 the prior distribution on the topic distribution is a symmetric Dirichlet, we write TOPIC-LDA(✏,↵ ) for the corresponding optimization problem. [sent-166, score-0.552]
56 1 Polynomial-time inference for large hyperparameters (↵t 1) When ↵t 1, Eq. [sent-169, score-0.198]
57 Note that this is in contrast to the MAP inference problem discussed in Section 2, which we showed was hard for all choices of ↵. [sent-172, score-0.161]
58 When ↵ 1, the prior corresponds to a bias toward a particular ✓ topic distribution. [sent-176, score-0.504]
59 These results illustrate that the Dirichlet prior encourages sparse MAP solutions: the topic distribution will be large on as few topics as necessary to explain every word of the document, and otherwise will be close to zero. [sent-187, score-1.183]
60 The following lemma shows that in any optimal solution to TOPIC-LDA(✏,↵ ), for every word, there is at least one topic that both has large probability and gives non-trivial probability to this word. [sent-188, score-0.518]
61 All optimal solutions ✓⇤ to TOPIC-LDA(✏,↵ ) have the following ⇤ ⇤ ˆ K(↵, T, N ) where t = arg maxt it ✓t . [sent-192, score-0.155]
62 Assume for the purpose of contra⇤ ⇤ ˆ diction that there exists a word ˆ such that ✓t < K(↵, T, N ), where t = arg maxt ˆ ✓t . [sent-195, score-0.31]
63 Let 1 = t2Y ✓t and 2 = P Y denote the set of topics t 6= t such that ✓t ⇤ ˆ t62Y,t6=t ✓t . [sent-197, score-0.456]
64 Next, we show that if a topic is not su ciently “used” then it will be given a probability very close to zero. [sent-203, score-0.512]
65 By used, we mean that for at least one word, the topic is close in probability to that of the largest contributor to the likelihood of the word. [sent-204, score-0.504]
66 Suppose 1 ⇤ ⇤ ˆ topic t has ✓t < (N ) 1 K(↵, T, N ). [sent-212, score-0.476]
67 (9) (10) (11) ˆ it ⇤ ˜ Recall from Lemma 3 that, for each word i and t = arg maxt it ✓t , we have ✓t > K(↵, T, N ). [sent-221, score-0.31]
68 (12) log P ⇤ ⇤ ✓t it + ✓t it ✓t it K(↵, T, N ) it n ˜ ˆ ˆ t6=t t6=t ˆ ˆ ˆ Thus, ( ✓) (✓⇤ ) > (1 ↵) log e 1 1 ↵ +2 + 2(↵ 1) 1 = 0, completing the proof. [sent-225, score-0.15]
69 3 Inference is NP-hard for small hyperparameters (↵ < 1) The previous results characterize optimal solutions to TOPIC-LDA(✏,↵ ) and highlight the fact that optimal solutions are sparse. [sent-230, score-0.153]
70 In particular, it is possible to encode set cover instances as TOPIC-LDA(✏,↵ ) instances, where the set cover corresponds to those topics assigned appreciable probability. [sent-232, score-0.732]
71 Consider a set cover instance consisting of a universe of elements and a family of sets, where we assume for convenience that the minimum cover is neither a singleton, all but one of the family of sets, nor the entire family of sets, and that there are at least two elements in the universe. [sent-236, score-0.202]
72 As with our previous reduction, we have one topic per set and one word in the document for each element. [sent-237, score-0.897]
73 We let Pr(wi |zi = t) = 0 when element wi is not in set t, and a constant otherwise (we make every topic have the uniform distribution over the same number of words, some of which may be dummy words not appearing in the document). [sent-238, score-0.667]
74 Let Si ✓ [T ] denote the set of topics to which word i belongs. [sent-239, score-0.655]
75 ⇤ Let C✓⇤ ✓ [T ] be those topics t 2 [T ] such that ✓t K(↵, T, n), where ✓⇤ is an optimal solution to TOPIC-LDA(✏,↵ ). [sent-241, score-0.456]
76 It follows that {✏, K(↵, T, N )} with |C| 1 ˜ ˜ (T log K(↵, T, N ) + N log T ). [sent-250, score-0.15]
77 (15) P (✓) + L(✓) P (✓⇤ ) + L(✓⇤ ) > (1 ↵) log ✏ This is greater than 0 precisely when (1 ↵) log 1 > log T N K(↵, T, N )T . [sent-251, score-0.225]
78 ✏ Note that although ✏ is exponentially small in N and T , the size of its representation in binary is polynomial in N and T , and thus polynomial in the size of the set cover instance. [sent-252, score-0.182]
79 0, the solutions to TOPIC-LDA(✏,↵ ) become degenerate, concentrating their support on the minimal set of topics C ✓ [T ] such that 8i, 9t 2 C s. [sent-254, score-0.529]
80 4 Sampling from the posterior The previous sections of the paper focused on MAP inference problems. [sent-259, score-0.191]
81 In this section, we study the problem of marginal inference in LDA. [sent-260, score-0.157]
82 However, we believe our observation provides insight into the complexity of sampling when ↵ is not too small, and may be a starting point towards explaining the empirical success of using Markov chain Monte Carlo to do inference in LDA. [sent-270, score-0.227]
83 The intuition behind the proof is that, when ↵ is small enough, an appreciable amount of the probability mass corresponds to the sparsest possible ✓ vectors where the supported topics together cover all of the words. [sent-273, score-0.611]
84 As a result, we could directly read o↵ the minimal set cover from the posterior marginals E[✓t | w]. [sent-274, score-0.169]
85 5 Discussion In this paper, we have shown that the complexity of MAP inference in LDA strongly depends on the e↵ective number of topics per document. [sent-281, score-0.63]
86 When a document is generated from a small number of topics (regardless of the number of topics in the model), WORD-LDA can be solved in polynomial time. [sent-282, score-1.184]
87 On the other hand, if a document can use an arbitrary number of topics, WORD-LDA is NP-hard. [sent-284, score-0.222]
88 We have also studied the problem of computing MAP estimates and expectations of the topic distribution. [sent-286, score-0.476]
89 [2003] suggest a heuristic for inference that is also applicable a to LDA: if there exists a word that can only be generated with high probability from one of the topics, then the corresponding topic must appear in the MAP assignment whenever that word appears in a document. [sent-291, score-1.149]
90 [2008] give a hardness reduction and greedy algorithm for learning topic models. [sent-293, score-0.594]
91 More broadly, it would be interesting to consider the complexity of learning the per-topic word distributions t . [sent-295, score-0.24]
92 First, our exact algorithms can be used to evaluate the accuracy of approximate inference algorithms, for example by comparing to the MAP of the variational posterior. [sent-297, score-0.183]
93 Also, note that we did not give an analogous exact algorithm for the MAP topic distribution when the posterior has support on only a small number of topics. [sent-300, score-0.582]
94 In this setting, it may be possible to find this set of topics by trying all S ✓ [T ] of small cardinality and then doing a (non-uniform) grid search over the topic distribution restricted to support S. [sent-301, score-0.956]
95 Finally, our structural results on the sparsity induced by the Dirichlet prior draws connections between inference in topic models and sparse signal recovery. [sent-302, score-0.672]
96 We proved that the MAP topic distribution has, for each topic t, either ✓t ⇡ ✏ or ✓t bounded below by some value (much larger than ✏). [sent-303, score-0.976]
97 Our work motivates the study of greedy algorithms for MAP inference in topic models, analogous to those used for set cover. [sent-305, score-0.633]
98 Relative performance guarantees for approximate inference in latent Dirichlet allocation. [sent-454, score-0.171]
99 A simple algorithm for topic identification in 0-1 a data. [sent-491, score-0.476]
100 A collapsed variational Bayesian inference algorithm for latent Dirichlet allocation. [sent-501, score-0.231]
wordName wordTfidf (topN-words)
[('topic', 0.476), ('topics', 0.456), ('document', 0.222), ('lda', 0.207), ('word', 0.199), ('map', 0.195), ('xit', 0.17), ('nt', 0.161), ('dirichlet', 0.152), ('assignment', 0.142), ('inference', 0.133), ('cu', 0.108), ('words', 0.107), ('lit', 0.1), ('zi', 0.094), ('pr', 0.091), ('issn', 0.09), ('maxt', 0.088), ('hardness', 0.085), ('cover', 0.082), ('log', 0.075), ('blei', 0.069), ('packing', 0.069), ('lovasz', 0.066), ('hyperparameters', 0.065), ('assignments', 0.063), ('wi', 0.06), ('erent', 0.06), ('pt', 0.059), ('posterior', 0.058), ('cl', 0.055), ('biernacki', 0.05), ('chr', 0.05), ('gri', 0.05), ('halperin', 0.05), ('kiefer', 0.05), ('mcauli', 0.05), ('miettinen', 0.05), ('sepp', 0.05), ('polynomial', 0.05), ('di', 0.045), ('solutions', 0.044), ('appreciable', 0.044), ('nen', 0.044), ('nding', 0.042), ('gap', 0.042), ('lemma', 0.042), ('complexity', 0.041), ('assigned', 0.041), ('isbn', 0.04), ('ective', 0.04), ('vempala', 0.04), ('latent', 0.038), ('universe', 0.038), ('kivinen', 0.038), ('mukherjee', 0.038), ('contradicting', 0.038), ('allocation', 0.037), ('perfect', 0.037), ('su', 0.036), ('posteriori', 0.036), ('sparsity', 0.035), ('contradiction', 0.034), ('newman', 0.034), ('collapsed', 0.034), ('koller', 0.033), ('reduction', 0.033), ('doi', 0.033), ('porteous', 0.033), ('exponentiated', 0.032), ('probabilistic', 0.031), ('partitions', 0.03), ('yuille', 0.03), ('minimal', 0.029), ('retrieval', 0.029), ('proof', 0.029), ('integrating', 0.029), ('hard', 0.028), ('believe', 0.028), ('likelihood', 0.028), ('broadly', 0.028), ('prior', 0.028), ('disjoint', 0.028), ('constants', 0.027), ('instances', 0.027), ('logarithm', 0.027), ('valid', 0.027), ('krause', 0.026), ('corpus', 0.026), ('variational', 0.026), ('matching', 0.025), ('schuurmans', 0.025), ('ths', 0.025), ('sampling', 0.025), ('maximization', 0.025), ('study', 0.024), ('teh', 0.024), ('distribution', 0.024), ('exact', 0.024), ('arg', 0.023)]
simIndex simValue paperId paperTitle
same-paper 1 1.0000001 58 nips-2011-Complexity of Inference in Latent Dirichlet Allocation
Author: David Sontag, Dan Roy
Abstract: We consider the computational complexity of probabilistic inference in Latent Dirichlet Allocation (LDA). First, we study the problem of finding the maximum a posteriori (MAP) assignment of topics to words, where the document’s topic distribution is integrated out. We show that, when the e↵ective number of topics per document is small, exact inference takes polynomial time. In contrast, we show that, when a document has a large number of topics, finding the MAP assignment of topics to words in LDA is NP-hard. Next, we consider the problem of finding the MAP topic distribution for a document, where the topic-word assignments are integrated out. We show that this problem is also NP-hard. Finally, we briefly discuss the problem of sampling from the posterior, showing that this is NP-hard in one restricted setting, but leaving open the general question. 1
2 0.45488369 129 nips-2011-Improving Topic Coherence with Regularized Topic Models
Author: David Newman, Edwin V. Bonilla, Wray Buntine
Abstract: Topic models have the potential to improve search and browsing by extracting useful semantic themes from web pages and other text documents. When learned topics are coherent and interpretable, they can be valuable for faceted browsing, results set diversity analysis, and document retrieval. However, when dealing with small collections or noisy text (e.g. web search result snippets or blog posts), learned topics can be less coherent, less interpretable, and less useful. To overcome this, we propose two methods to regularize the learning of topic models. Our regularizers work by creating a structured prior over words that reflect broad patterns in the external data. Using thirteen datasets we show that both regularizers improve topic coherence and interpretability while learning a faithful representation of the collection of interest. Overall, this work makes topic models more useful across a broader range of text data. 1
3 0.39985961 281 nips-2011-The Doubly Correlated Nonparametric Topic Model
Author: Dae I. Kim, Erik B. Sudderth
Abstract: Topic models are learned via a statistical model of variation within document collections, but designed to extract meaningful semantic structure. Desirable traits include the ability to incorporate annotations or metadata associated with documents; the discovery of correlated patterns of topic usage; and the avoidance of parametric assumptions, such as manual specification of the number of topics. We propose a doubly correlated nonparametric topic (DCNT) model, the first model to simultaneously capture all three of these properties. The DCNT models metadata via a flexible, Gaussian regression on arbitrary input features; correlations via a scalable square-root covariance representation; and nonparametric selection from an unbounded series of potential topics via a stick-breaking construction. We validate the semantic structure and predictive performance of the DCNT using a corpus of NIPS documents annotated by various metadata. 1
4 0.27396172 115 nips-2011-Hierarchical Topic Modeling for Analysis of Time-Evolving Personal Choices
Author: Xianxing Zhang, Lawrence Carin, David B. Dunson
Abstract: The nested Chinese restaurant process is extended to design a nonparametric topic-model tree for representation of human choices. Each tree path corresponds to a type of person, and each node (topic) has a corresponding probability vector over items that may be selected. The observed data are assumed to have associated temporal covariates (corresponding to the time at which choices are made), and we wish to impose that with increasing time it is more probable that topics deeper in the tree are utilized. This structure is imposed by developing a new “change point
5 0.24891025 110 nips-2011-Group Anomaly Detection using Flexible Genre Models
Author: Liang Xiong, Barnabás Póczos, Jeff G. Schneider
Abstract: An important task in exploring and analyzing real-world data sets is to detect unusual and interesting phenomena. In this paper, we study the group anomaly detection problem. Unlike traditional anomaly detection research that focuses on data points, our goal is to discover anomalous aggregated behaviors of groups of points. For this purpose, we propose the Flexible Genre Model (FGM). FGM is designed to characterize data groups at both the point level and the group level so as to detect various types of group anomalies. We evaluate the effectiveness of FGM on both synthetic and real data sets including images and turbulence data, and show that it is superior to existing approaches in detecting group anomalies. 1
6 0.22003412 116 nips-2011-Hierarchically Supervised Latent Dirichlet Allocation
7 0.15659466 156 nips-2011-Learning to Learn with Compound HD Models
8 0.10939275 14 nips-2011-A concave regularization technique for sparse mixture models
9 0.088195533 176 nips-2011-Multi-View Learning of Word Embeddings via CCA
10 0.082090877 93 nips-2011-Extracting Speaker-Specific Information with a Regularized Siamese Deep Network
11 0.081228681 160 nips-2011-Linear Submodular Bandits and their Application to Diversified Retrieval
12 0.08004465 258 nips-2011-Sparse Bayesian Multi-Task Learning
13 0.0755511 170 nips-2011-Message-Passing for Approximate MAP Inference with Latent Variables
14 0.074852258 40 nips-2011-Automated Refinement of Bayes Networks' Parameters based on Test Ordering Constraints
15 0.073307954 191 nips-2011-Nonnegative dictionary learning in the exponential noise model for adaptive music signal representation
16 0.071690582 134 nips-2011-Infinite Latent SVM for Classification and Multi-task Learning
17 0.070639707 142 nips-2011-Large-Scale Sparse Principal Component Analysis with Application to Text Data
18 0.057454225 217 nips-2011-Practical Variational Inference for Neural Networks
19 0.05391163 159 nips-2011-Learning with the weighted trace-norm under arbitrary sampling distributions
20 0.053080738 205 nips-2011-Online Submodular Set Cover, Ranking, and Repeated Active Learning
topicId topicWeight
[(0, 0.202), (1, 0.038), (2, -0.031), (3, -0.005), (4, -0.055), (5, -0.553), (6, 0.183), (7, 0.175), (8, -0.267), (9, 0.068), (10, 0.162), (11, 0.183), (12, 0.002), (13, 0.026), (14, 0.051), (15, -0.038), (16, 0.088), (17, 0.014), (18, -0.04), (19, 0.077), (20, -0.124), (21, -0.008), (22, 0.059), (23, 0.013), (24, 0.057), (25, 0.005), (26, -0.028), (27, -0.009), (28, 0.001), (29, -0.045), (30, -0.034), (31, -0.026), (32, 0.058), (33, -0.06), (34, 0.0), (35, -0.002), (36, -0.006), (37, 0.007), (38, 0.003), (39, 0.001), (40, -0.019), (41, -0.028), (42, -0.01), (43, 0.012), (44, 0.002), (45, -0.021), (46, 0.004), (47, -0.023), (48, 0.01), (49, -0.025)]
simIndex simValue paperId paperTitle
1 0.96788549 129 nips-2011-Improving Topic Coherence with Regularized Topic Models
Author: David Newman, Edwin V. Bonilla, Wray Buntine
Abstract: Topic models have the potential to improve search and browsing by extracting useful semantic themes from web pages and other text documents. When learned topics are coherent and interpretable, they can be valuable for faceted browsing, results set diversity analysis, and document retrieval. However, when dealing with small collections or noisy text (e.g. web search result snippets or blog posts), learned topics can be less coherent, less interpretable, and less useful. To overcome this, we propose two methods to regularize the learning of topic models. Our regularizers work by creating a structured prior over words that reflect broad patterns in the external data. Using thirteen datasets we show that both regularizers improve topic coherence and interpretability while learning a faithful representation of the collection of interest. Overall, this work makes topic models more useful across a broader range of text data. 1
same-paper 2 0.96627539 58 nips-2011-Complexity of Inference in Latent Dirichlet Allocation
Author: David Sontag, Dan Roy
Abstract: We consider the computational complexity of probabilistic inference in Latent Dirichlet Allocation (LDA). First, we study the problem of finding the maximum a posteriori (MAP) assignment of topics to words, where the document’s topic distribution is integrated out. We show that, when the e↵ective number of topics per document is small, exact inference takes polynomial time. In contrast, we show that, when a document has a large number of topics, finding the MAP assignment of topics to words in LDA is NP-hard. Next, we consider the problem of finding the MAP topic distribution for a document, where the topic-word assignments are integrated out. We show that this problem is also NP-hard. Finally, we briefly discuss the problem of sampling from the posterior, showing that this is NP-hard in one restricted setting, but leaving open the general question. 1
3 0.94740236 281 nips-2011-The Doubly Correlated Nonparametric Topic Model
Author: Dae I. Kim, Erik B. Sudderth
Abstract: Topic models are learned via a statistical model of variation within document collections, but designed to extract meaningful semantic structure. Desirable traits include the ability to incorporate annotations or metadata associated with documents; the discovery of correlated patterns of topic usage; and the avoidance of parametric assumptions, such as manual specification of the number of topics. We propose a doubly correlated nonparametric topic (DCNT) model, the first model to simultaneously capture all three of these properties. The DCNT models metadata via a flexible, Gaussian regression on arbitrary input features; correlations via a scalable square-root covariance representation; and nonparametric selection from an unbounded series of potential topics via a stick-breaking construction. We validate the semantic structure and predictive performance of the DCNT using a corpus of NIPS documents annotated by various metadata. 1
4 0.81215441 110 nips-2011-Group Anomaly Detection using Flexible Genre Models
Author: Liang Xiong, Barnabás Póczos, Jeff G. Schneider
Abstract: An important task in exploring and analyzing real-world data sets is to detect unusual and interesting phenomena. In this paper, we study the group anomaly detection problem. Unlike traditional anomaly detection research that focuses on data points, our goal is to discover anomalous aggregated behaviors of groups of points. For this purpose, we propose the Flexible Genre Model (FGM). FGM is designed to characterize data groups at both the point level and the group level so as to detect various types of group anomalies. We evaluate the effectiveness of FGM on both synthetic and real data sets including images and turbulence data, and show that it is superior to existing approaches in detecting group anomalies. 1
5 0.7096467 116 nips-2011-Hierarchically Supervised Latent Dirichlet Allocation
Author: Adler J. Perotte, Frank Wood, Noemie Elhadad, Nicholas Bartlett
Abstract: We introduce hierarchically supervised latent Dirichlet allocation (HSLDA), a model for hierarchically and multiply labeled bag-of-word data. Examples of such data include web pages and their placement in directories, product descriptions and associated categories from product hierarchies, and free-text clinical records and their assigned diagnosis codes. Out-of-sample label prediction is the primary goal of this work, but improved lower-dimensional representations of the bagof-word data are also of interest. We demonstrate HSLDA on large-scale data from clinical document labeling and retail product categorization tasks. We show that leveraging the structure from hierarchical labels improves out-of-sample label prediction substantially when compared to models that do not. 1
6 0.70953399 115 nips-2011-Hierarchical Topic Modeling for Analysis of Time-Evolving Personal Choices
7 0.57877618 14 nips-2011-A concave regularization technique for sparse mixture models
8 0.45944676 156 nips-2011-Learning to Learn with Compound HD Models
9 0.40164375 160 nips-2011-Linear Submodular Bandits and their Application to Diversified Retrieval
10 0.28614819 176 nips-2011-Multi-View Learning of Word Embeddings via CCA
11 0.28437456 191 nips-2011-Nonnegative dictionary learning in the exponential noise model for adaptive music signal representation
13 0.25308269 40 nips-2011-Automated Refinement of Bayes Networks' Parameters based on Test Ordering Constraints
14 0.23841573 221 nips-2011-Priors over Recurrent Continuous Time Processes
15 0.23638012 42 nips-2011-Bayesian Bias Mitigation for Crowdsourcing
16 0.23177454 27 nips-2011-Advice Refinement in Knowledge-Based SVMs
17 0.23168764 216 nips-2011-Portmanteau Vocabularies for Multi-Cue Image Representation
18 0.22909999 104 nips-2011-Generalized Beta Mixtures of Gaussians
19 0.22876246 74 nips-2011-Dynamic Pooling and Unfolding Recursive Autoencoders for Paraphrase Detection
20 0.226271 142 nips-2011-Large-Scale Sparse Principal Component Analysis with Application to Text Data
topicId topicWeight
[(0, 0.03), (4, 0.039), (10, 0.01), (20, 0.046), (21, 0.231), (26, 0.028), (31, 0.094), (33, 0.021), (43, 0.061), (45, 0.13), (57, 0.067), (65, 0.026), (74, 0.054), (83, 0.043), (86, 0.01), (99, 0.038)]
simIndex simValue paperId paperTitle
1 0.8616991 31 nips-2011-An Application of Tree-Structured Expectation Propagation for Channel Decoding
Author: Pablo M. Olmos, Luis Salamanca, Juan Fuentes, Fernando Pérez-Cruz
Abstract: We show an application of a tree structure for approximate inference in graphical models using the expectation propagation algorithm. These approximations are typically used over graphs with short-range cycles. We demonstrate that these approximations also help in sparse graphs with long-range loops, as the ones used in coding theory to approach channel capacity. For asymptotically large sparse graph, the expectation propagation algorithm together with the tree structure yields a completely disconnected approximation to the graphical model but, for for finite-length practical sparse graphs, the tree structure approximation to the code graph provides accurate estimates for the marginal of each variable. Furthermore, we propose a new method for constructing the tree structure on the fly that might be more amenable for sparse graphs with general factors. 1
same-paper 2 0.81431234 58 nips-2011-Complexity of Inference in Latent Dirichlet Allocation
Author: David Sontag, Dan Roy
Abstract: We consider the computational complexity of probabilistic inference in Latent Dirichlet Allocation (LDA). First, we study the problem of finding the maximum a posteriori (MAP) assignment of topics to words, where the document’s topic distribution is integrated out. We show that, when the e↵ective number of topics per document is small, exact inference takes polynomial time. In contrast, we show that, when a document has a large number of topics, finding the MAP assignment of topics to words in LDA is NP-hard. Next, we consider the problem of finding the MAP topic distribution for a document, where the topic-word assignments are integrated out. We show that this problem is also NP-hard. Finally, we briefly discuss the problem of sampling from the posterior, showing that this is NP-hard in one restricted setting, but leaving open the general question. 1
3 0.74302632 281 nips-2011-The Doubly Correlated Nonparametric Topic Model
Author: Dae I. Kim, Erik B. Sudderth
Abstract: Topic models are learned via a statistical model of variation within document collections, but designed to extract meaningful semantic structure. Desirable traits include the ability to incorporate annotations or metadata associated with documents; the discovery of correlated patterns of topic usage; and the avoidance of parametric assumptions, such as manual specification of the number of topics. We propose a doubly correlated nonparametric topic (DCNT) model, the first model to simultaneously capture all three of these properties. The DCNT models metadata via a flexible, Gaussian regression on arbitrary input features; correlations via a scalable square-root covariance representation; and nonparametric selection from an unbounded series of potential topics via a stick-breaking construction. We validate the semantic structure and predictive performance of the DCNT using a corpus of NIPS documents annotated by various metadata. 1
4 0.69453442 116 nips-2011-Hierarchically Supervised Latent Dirichlet Allocation
Author: Adler J. Perotte, Frank Wood, Noemie Elhadad, Nicholas Bartlett
Abstract: We introduce hierarchically supervised latent Dirichlet allocation (HSLDA), a model for hierarchically and multiply labeled bag-of-word data. Examples of such data include web pages and their placement in directories, product descriptions and associated categories from product hierarchies, and free-text clinical records and their assigned diagnosis codes. Out-of-sample label prediction is the primary goal of this work, but improved lower-dimensional representations of the bagof-word data are also of interest. We demonstrate HSLDA on large-scale data from clinical document labeling and retail product categorization tasks. We show that leveraging the structure from hierarchical labels improves out-of-sample label prediction substantially when compared to models that do not. 1
5 0.66346133 129 nips-2011-Improving Topic Coherence with Regularized Topic Models
Author: David Newman, Edwin V. Bonilla, Wray Buntine
Abstract: Topic models have the potential to improve search and browsing by extracting useful semantic themes from web pages and other text documents. When learned topics are coherent and interpretable, they can be valuable for faceted browsing, results set diversity analysis, and document retrieval. However, when dealing with small collections or noisy text (e.g. web search result snippets or blog posts), learned topics can be less coherent, less interpretable, and less useful. To overcome this, we propose two methods to regularize the learning of topic models. Our regularizers work by creating a structured prior over words that reflect broad patterns in the external data. Using thirteen datasets we show that both regularizers improve topic coherence and interpretability while learning a faithful representation of the collection of interest. Overall, this work makes topic models more useful across a broader range of text data. 1
6 0.65611708 149 nips-2011-Learning Sparse Representations of High Dimensional Data on Large Scale Dictionaries
7 0.65585941 115 nips-2011-Hierarchical Topic Modeling for Analysis of Time-Evolving Personal Choices
8 0.65472269 159 nips-2011-Learning with the weighted trace-norm under arbitrary sampling distributions
9 0.6529519 258 nips-2011-Sparse Bayesian Multi-Task Learning
10 0.65257668 30 nips-2011-Algorithms for Hyper-Parameter Optimization
11 0.65232503 57 nips-2011-Comparative Analysis of Viterbi Training and Maximum Likelihood Estimation for HMMs
12 0.65157843 180 nips-2011-Multiple Instance Filtering
13 0.65109503 231 nips-2011-Randomized Algorithms for Comparison-based Search
14 0.65022194 32 nips-2011-An Empirical Evaluation of Thompson Sampling
15 0.64968258 37 nips-2011-Analytical Results for the Error in Filtering of Gaussian Processes
16 0.64918351 276 nips-2011-Structured sparse coding via lateral inhibition
17 0.6484288 219 nips-2011-Predicting response time and error rates in visual search
18 0.64763898 110 nips-2011-Group Anomaly Detection using Flexible Genre Models
19 0.64692825 84 nips-2011-EigenNet: A Bayesian hybrid of generative and conditional models for sparse learning
20 0.64655167 156 nips-2011-Learning to Learn with Compound HD Models