nips nips2012 nips2012-124 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Michael Paul, Mark Dredze
Abstract: Latent variable models can be enriched with a multi-dimensional structure to consider the many latent factors in a text corpus, such as topic, author perspective and sentiment. We introduce factorial LDA, a multi-dimensional model in which a document is influenced by K different factors, and each word token depends on a K-dimensional vector of latent variables. Our model incorporates structured word priors and learns a sparse product of factors. Experiments on research abstracts show that our model can learn latent factors such as research topic, scientific discipline, and focus (methods vs. applications). Our modeling improvements reduce test perplexity and improve human interpretability of the discovered factors. 1
Reference: text
sentIndex sentText sentNum sentScore
1 edu Abstract Latent variable models can be enriched with a multi-dimensional structure to consider the many latent factors in a text corpus, such as topic, author perspective and sentiment. [sent-4, score-0.357]
2 We introduce factorial LDA, a multi-dimensional model in which a document is influenced by K different factors, and each word token depends on a K-dimensional vector of latent variables. [sent-5, score-0.762]
3 Our model incorporates structured word priors and learns a sparse product of factors. [sent-6, score-0.614]
4 1 Introduction There are many factors that contribute to a document’s word choice: topic, syntax, sentiment, author perspective, and others. [sent-10, score-0.511]
5 Some topic models have been used to model specific factors like sentiment [2], and more general models—like the topic aspect model [3] and sparse additive generative models (SAGE) [4]—have jointly considered both topic and another factor, such as perspective. [sent-13, score-1.234]
6 While standard topic models associate each word token with a single latent topic variable, a multi-dimensional model associates each token with a vector of multiple factors, such as (topic, political ideology) or (product type, sentiment, author age). [sent-16, score-1.342]
7 First, we must ensure consistency across different word distributions which have the same components. [sent-18, score-0.423]
8 For example, the word distributions associated with the (topic, perspective) pairs (ECONOMICS,LIBERAL) and (ECONOMICS,CONSERVATIVE) should both give high probability to words about economics. [sent-19, score-0.473]
9 Additionally, increasing the number of factors results in a multiplicative increase in the number of possible tuples that can be formed, and not all tuples will be well-supported by the data. [sent-20, score-0.733]
10 We address these two issues by adding additional structure to our model: we impose structured word priors that link tuples with common components, and we place a sparse prior over the space of possible tuples. [sent-21, score-0.992]
11 1 • introduce a general model that can accommodate K different factors (dimensions) of language, • design structured priors over the word distributions that tie together common factors, • enforce a sparsity pattern which excludes unsupported combinations of components (tuples). [sent-24, score-1.042]
12 Under LDA, a document is generated by choosing the topic distribution θ from a Dirichlet prior, then for each token we sample a latent topic t from this distribution before sampling a word w from the tth word distribution φt . [sent-28, score-1.638]
13 Without additional structure, LDA tends to learn distributions which correspond to semantic topics (such as SPORTS or ECONOMICS) [6] which dominate the choice of words in a document, rather than syntax, perspective, or other aspects of document content. [sent-29, score-0.347]
14 This structure makes sense if a corpus is composed of two different factors, and the two dimensions might correspond to factors such as news topic and political perspective (if we are modeling newspaper editorials), or research topic and discipline (if we are modeling scientific papers). [sent-31, score-1.01]
15 Individual cells of the matrix would represent pairs such as (ECONOMICS,CONSERVATIVE) or (GRAMMAR,LINGUISTICS) and each is associated with a word distribution φz . [sent-32, score-0.375]
16 Let us expand this idea further by assuming K factors modeled with a K-dimensional array, where each cell of the array has a pointer to a word distribution corresponding to that particular K-tuple. [sent-34, score-0.61]
17 For example, in addition to topic and perspective, we might want to model a third factor of the author’s gender in newspaper editorials, yielding triples such as (ECONOMICS,CONSERVATIVE,MALE). [sent-35, score-0.423]
18 Conceptually, each K-tuple t functions as a topic in LDA (with an associated word distribution φt ) except that K-tuples imply a structure, e. [sent-36, score-0.661]
19 At its core, our model follows the basic template of LDA, but each word token is associated with a K-tuple rather than a single topic value. [sent-40, score-0.768]
20 In f-LDA, we induce a factorial structure by creating priors which tie together tuples that share components: distributions involving the pair (ECONOMICS,CONSERVATIVE) should have commonalties with distributions for (ECO NOMICS , LIBERAL ). [sent-43, score-0.739]
21 The key ingredients of our new model are: • We model the intuition that tuples which share components should share other properties. [sent-44, score-0.495]
22 For example, we expect the word distributions for (ECONOMICS,CONSERVATIVE) and (ECO NOMICS , LIBERAL ) to both give high probability to words about economics, while the pairs ( ECO NOMICS , LIBERAL ) and ( ENVIRONMENT, LIBERAL ) should both reflect words about liberalism. [sent-45, score-0.536]
23 Similarly, we want each document’s distribution over tuples to reflect the same type of consistency. [sent-46, score-0.307]
24 If a document is written from a liberal perspective, then we believe that pairs of the form (*,LIBERAL) are more likely to have high probability than pairs with CONSERVATIVE as the second component. [sent-47, score-0.276]
25 This consistency across factors is encouraged by sharing parameters across the word and topic prior distributions in the model: this encodes our a priori assumption that distributions which share components should be similar. [sent-48, score-1.133]
26 As the dimensionality of the array increases, we are going to encounter problems of overparameterization, because the model will likely contain more tuples than are observed in the data. [sent-50, score-0.417]
27 We handle this by having an auxiliary multi-dimensional array which encodes a sparsity pattern over tuples. [sent-51, score-0.259]
28 The priors over tuples are augmented with this sparsity pattern. [sent-52, score-0.632]
29 These priors model the belief that the Cartesian product of factors should be sparse; the posterior may “opt out” of some tuples. [sent-53, score-0.295]
30 (b) An illustration of word distributions in f-LDA with two factors. [sent-55, score-0.384]
31 When applying f-LDA to a collection of scientific articles from various disciplines, we learn weights ω corresponding to a topic we call WORDS and the discipline EDUCATION as well as background words. [sent-56, score-0.392]
32 , tK ): ωw ˆ (t) ω (a) Sample word distribution φt ∼ Dir(ˆ (t) ) (b) Sample sparsity “bit” bt ∼ Beta(γ0 , γ1 ) (k) (0) exp ω (B) + ωw + ωtk w k (d) αt ˆ 3. [sent-64, score-0.632]
33 For each document d ∈ D: (D,k) exp α(B)+ αtk (d,k) + αtk (1) k (a) Draw document component weights α(d,k) ∼ N (0, Iσ 2 ) for each factor k (b) Sample distribution over tuples θ(d) ∼ Dir(B · α(d) ) ˆ (c) For each token: i. [sent-65, score-0.63]
34 Sample word w ∼ φz See Figure 1a for the graphical model, and Figure 1b for an illustration of how the weight vectors ω (0) and ω (k) are combined to form ω for a ˆ particular tuple that was inferred by our model. [sent-67, score-0.442]
35 As discussed above, the only difference between f-LDA and LDA is that structure has been added to the Dirichlet priors for the word and topic distributions. [sent-69, score-0.837]
36 Prior over φ: We formulate the priors of φ to encourage word distributions to be consistent across components of each factor. [sent-73, score-0.706]
37 For example, tuples that reflect the same topic should share words. [sent-74, score-0.673]
38 To achieve this goal, we link the priors for tuples that share common components by utilizing a loglinear parameterization of the Dirichlet prior of φ (Eq. [sent-75, score-0.688]
39 Formally, we place a prior Dirichlet(ˆ (t) ) ω over φt , the word distribution for tuple t = (t1 , t2 , . [sent-77, score-0.513]
40 Finally, ωtk w introduces bias parameters for each word w (k) for component tk of the kth factor. [sent-85, score-0.573]
41 By increasing the weight of a particular ωtk w , we increase the expected relative log-probabilities of word w in φz for all z that contain component tk , thereby tying these priors together. [sent-86, score-0.689]
42 Recall that we want documents to naturally favor tuples that share components, i. [sent-88, score-0.454]
43 Second, αtk indicates the bias for the kth factor’s component tk across the entire corpus D, which enables the model to favor certain (d,k) components a priori. [sent-94, score-0.479]
44 Finally, αtk is the bias for the kth factor’s component tk specifically in document d. [sent-95, score-0.351]
45 Sparsity over tuples: Finally, we describe the generation of the sparsity pattern over tuples in the corpus. [sent-98, score-0.456]
46 We assume a K-dimensional binary array B, where an entry bt corresponds to tuple t. [sent-99, score-0.337]
47 θ will not include tuples ˆ for which bt = 0; otherwise the prior will remain unchanged. [sent-102, score-0.512]
48 The effect is that the prior assigns tiny probabilities to some tuples instead of strictly 0. [sent-109, score-0.378]
49 3 Related Work Previous work on multi-dimensional modeling includes the topic aspect model (TAM) [3], multiview LDA (mv-LDA) [10], cross-collection LDA [11] and sparse additive generative models (SAGE) [4], which jointly consider both topic and another factor. [sent-110, score-0.686]
50 Other work has jointly modeled topic and sentiment [2]. [sent-111, score-0.399]
51 An important contribution of f-LDA is the use of priors to tie together word distributions with the same components. [sent-119, score-0.595]
52 An alternative approach would be to strictly enforce consistency, such as through a “product of experts” model [17], in which each factor has independent word distributions that are multiplied together and renormalized to form the distribution for a particular tuple, i. [sent-121, score-0.467]
53 Syntactic topic models [18] and shared components topic models [19] follow this approach. [sent-124, score-0.764]
54 Our structured word prior generalizes both of these approaches. [sent-125, score-0.477]
55 2 One approach is to (approximately) collapse out the sparsity array [9], but this is difficult when working over the entire corpus of tokens. [sent-129, score-0.36]
56 There have been several recent approaches that enforce sparsity in topic models. [sent-132, score-0.503]
57 First, one could enforce sparsity over the topic-specific word distributions, forcing each topic to select a subset of relevant words. [sent-134, score-0.852]
58 This is the idea behind sparse topic models [20], which restrict topics to a subset of the vocabulary, and SAGE [4], which applies L1 regularization to word weights. [sent-135, score-0.845]
59 A second approach is to enforce sparsity in the document-specific topic distributions, focusing each document on a subset of relevant topics. [sent-136, score-0.63]
60 Finally—our contribution—is to impose sparsity among the set of topics (or K-tuples) that are available to the model. [sent-138, score-0.271]
61 In both models, words are generated by first sampling a latent variable (in our case, a latent tuple) from a distribution θ, then sampling the word from φ conditioned on the latent variable. [sent-142, score-0.658]
62 1 Latent Variable Sampling The latent variables z are sampled using the standard collapsed Gibbs sampler for LDA [23], with the exception that the basic Dirichlet priors have been replaced with our structured priors for θ and φ. [sent-149, score-0.491]
63 The sampling equation for z for token i, given all other latent variable assignments z, the corpus w and the parameters (α, ω, and B) becomes: (t) (d) p(zi = t | z \{zi }, w, α, ω, B) ∝ nd + bt αt ˆ t ˆ nt + ωw w w where 4. [sent-150, score-0.424]
64 The top terms are a result of the Beta prior over bt , while the summation over documents reflects the gradient of the Dirichlet-multinomial compound. [sent-156, score-0.263]
65 If we remove the structured word priors and array sparsity, we are left with a basic multi-dimensional model (base). [sent-175, score-0.692]
66 We will compare against models where we add back in the structured word priors (W) and array sparsity (S), and finally the full f-LDA model (SW). [sent-176, score-0.871]
67 All variants are identical except that we fix all ω (k) = 0 to remove structured word priors and fix B = 1 to remove sparsity. [sent-177, score-0.582]
68 We also compare against the topic aspect model (TAM) [3], a two-dimensional model, using the public implementation. [sent-178, score-0.312]
69 3 TAM is similar to the “base” two-factor f-LDA model except that f-LDA has a single θ per document with priors that are independently weighted by each factor, whereas TAM has K independent θs, with a different θk for each factor. [sent-179, score-0.303]
70 1 in the Beta prior over bt , and we set σ 2 = 10 for α and 1 for ω in the Gaussian prior over weights. [sent-183, score-0.276]
71 We use the “document completion” method: we infer parameters from half a document and measure perplexity on the remaining half [24]. [sent-188, score-0.333]
72 Figure 2a shows that the structured word priors yield lower perplexity, while results for sparse models are mixed. [sent-191, score-0.644]
73 On ACL, sparsity consistently improves perplexity once the number of topics exceeds 20, while on CLEP sparsity does worse. [sent-192, score-0.626]
74 We consider the component-specific weights for each factor ωtk , which present an “overview” of each component, as well as the tuple-specific word distributions φt . [sent-198, score-0.425]
75 Consider the topic SPEECH: the triple (SPEECH,METHODS,THEORETICAL) emphasizes the linguistic side of speech processing (phonological, prosodic, etc. [sent-204, score-0.41]
76 We also see tuple sparsity (shaded 3 Most other two-dimensional models, including SAGE [4] and multi-view LDA [10], assume that the second factor is fixed and observed. [sent-206, score-0.283]
77 0 b (b) Figure 2: (a) The document completion perplexity on two data sets. [sent-214, score-0.333]
78 Models with “W” use structured word priors, and those with “S” use sparsity. [sent-215, score-0.406]
79 (b) The distribution of sparsity values induced on the ACL corpus with Z = (20, 2, 2). [sent-219, score-0.25]
80 For example, under the topic of DATA, a mostly empirical topic, tuples along the THEORETICAL component are inactive. [sent-243, score-0.647]
81 Human Judgments Perplexity may not correlate with human judgments [6], which are important for f-LDA since structured word priors and array sparsity are motivated in part by semantic coherence. [sent-244, score-0.906]
82 First, we presented annotators with two word lists (ten most frequent words assigned to each tuple4 ) that are assigned to the same topic, along with a word list randomly selected from another topic. [sent-247, score-0.833]
83 Annotators are asked to choose the word list that does not belong, i. [sent-248, score-0.349]
84 If the two tuples from the same topic are strongly related, the random list should be easy to identify. [sent-251, score-0.619]
85 Second, annotators are presented with pairs of word lists from the same topic and asked to judge the degree of relation using a 5-point Likert scale. [sent-252, score-0.759]
86 For the two models without the structured word priors, we use a symmetric prior (by optimizing only ω (B) and fixing ω (0) = 0), since symmetric word priors can lead to better interpretability [22]. [sent-254, score-1.065]
87 The word priors help in all cases, but much more so on ACL. [sent-259, score-0.525]
88 The models with sparsity are generally better than those without, even on CLEP, in contrast to perplexity where sparse models did worse. [sent-260, score-0.447]
89 This suggests that removing tuples with small bt values removes nonsensical tuples. [sent-261, score-0.441]
90 Overall, the judgments are worse for the CLEP corpus; this appears to be a difficult corpus to model due to high topic diversity and low overlap across disciplines. [sent-262, score-0.517]
91 It thus appears that both the structured priors and sparsity yield more interpretable word clusters. [sent-265, score-0.731]
92 5 We used an asymmetric prior for the perplexity experiments, which gave slightly better results. [sent-267, score-0.277]
93 99) speech words recognition prosodic written phonological spoken (b=0. [sent-315, score-0.249]
94 The higher variance near 0 relative to 1 suggests that the model prefers to keep bits “on”— and give tuples tiny probability—rather than “off. [sent-331, score-0.338]
95 Averaged across all numbers of topics, the perplexity of LDA was 97% the perplexity of f-LDA on ACL and 104% on CLEP. [sent-349, score-0.451]
96 Note that our experiments always use a comparable number of word distributions, thus Z = (20, 2, 2) is the same as Z = 80 topics in LDA. [sent-350, score-0.471]
97 To encourage the model to learn the desired patterns, we developed two new types of priors: word priors that share features across factors, and a sparsity prior that restricts the set of active tuples. [sent-352, score-0.865]
98 The IBP-compound dirichlet process and its application to focused topic modeling. [sent-414, score-0.421]
99 Cross-cultural analysis of blogs and forums with mixed-collection topic models. [sent-425, score-0.312]
100 Topic modeling for OLAP on multidimensional text databases: topic cube and its applications. [sent-433, score-0.349]
wordName wordTfidf (topN-words)
[('word', 0.349), ('topic', 0.312), ('tuples', 0.307), ('clep', 0.239), ('tam', 0.228), ('lda', 0.219), ('perplexity', 0.206), ('priors', 0.176), ('sparsity', 0.149), ('acl', 0.142), ('tk', 0.136), ('bt', 0.134), ('document', 0.127), ('topics', 0.122), ('factors', 0.119), ('array', 0.11), ('dirichlet', 0.109), ('token', 0.107), ('corpus', 0.101), ('liberal', 0.097), ('factorial', 0.097), ('tuple', 0.093), ('sage', 0.091), ('sentiment', 0.087), ('latent', 0.082), ('components', 0.08), ('discipline', 0.08), ('eco', 0.08), ('nomics', 0.08), ('annotators', 0.072), ('speech', 0.071), ('prior', 0.071), ('intrusion', 0.07), ('appl', 0.065), ('judgments', 0.065), ('words', 0.063), ('zk', 0.062), ('beta', 0.062), ('documents', 0.058), ('structured', 0.057), ('corpora', 0.055), ('share', 0.054), ('grammar', 0.052), ('syntax', 0.049), ('sw', 0.047), ('perspective', 0.046), ('abstracts', 0.046), ('mimno', 0.043), ('author', 0.043), ('enforce', 0.042), ('factor', 0.041), ('editorials', 0.04), ('newspaper', 0.04), ('olap', 0.04), ('phonological', 0.04), ('prosodic', 0.04), ('across', 0.039), ('parsing', 0.039), ('education', 0.039), ('text', 0.037), ('syntactic', 0.036), ('spoken', 0.035), ('tie', 0.035), ('dialogue', 0.035), ('topical', 0.035), ('distributions', 0.035), ('favor', 0.035), ('article', 0.033), ('interpretability', 0.033), ('pointer', 0.032), ('sparse', 0.032), ('bits', 0.031), ('kth', 0.031), ('triples', 0.03), ('skills', 0.03), ('gormley', 0.03), ('grammars', 0.03), ('models', 0.03), ('linguistics', 0.03), ('paul', 0.029), ('hyperparameters', 0.029), ('disciplines', 0.029), ('dredze', 0.029), ('bias', 0.029), ('vocabulary', 0.028), ('samplers', 0.028), ('lasso', 0.028), ('component', 0.028), ('encourage', 0.027), ('experts', 0.027), ('conservative', 0.027), ('ths', 0.027), ('ahmed', 0.027), ('linguistic', 0.027), ('pairs', 0.026), ('precision', 0.026), ('dir', 0.026), ('relatedness', 0.025), ('emnlp', 0.025), ('opinions', 0.025)]
simIndex simValue paperId paperTitle
same-paper 1 0.99999917 124 nips-2012-Factorial LDA: Sparse Multi-Dimensional Text Models
Author: Michael Paul, Mark Dredze
Abstract: Latent variable models can be enriched with a multi-dimensional structure to consider the many latent factors in a text corpus, such as topic, author perspective and sentiment. We introduce factorial LDA, a multi-dimensional model in which a document is influenced by K different factors, and each word token depends on a K-dimensional vector of latent variables. Our model incorporates structured word priors and learns a sparse product of factors. Experiments on research abstracts show that our model can learn latent factors such as research topic, scientific discipline, and focus (methods vs. applications). Our modeling improvements reduce test perplexity and improve human interpretability of the discovered factors. 1
2 0.28205562 12 nips-2012-A Neural Autoregressive Topic Model
Author: Hugo Larochelle, Stanislas Lauly
Abstract: We describe a new model for learning meaningful representations of text documents from an unlabeled collection of documents. This model is inspired by the recently proposed Replicated Softmax, an undirected graphical model of word counts that was shown to learn a better generative model and more meaningful document representations. Specifically, we take inspiration from the conditional mean-field recursive equations of the Replicated Softmax in order to define a neural network architecture that estimates the probability of observing a new word in a given document given the previously observed words. This paradigm also allows us to replace the expensive softmax distribution over words with a hierarchical distribution over paths in a binary tree of words. The end result is a model whose training complexity scales logarithmically with the vocabulary size instead of linearly as in the Replicated Softmax. Our experiments show that our model is competitive both as a generative model of documents and as a document representation learning algorithm. 1
3 0.25500116 19 nips-2012-A Spectral Algorithm for Latent Dirichlet Allocation
Author: Anima Anandkumar, Yi-kai Liu, Daniel J. Hsu, Dean P. Foster, Sham M. Kakade
Abstract: Topic modeling is a generalization of clustering that posits that observations (words in a document) are generated by multiple latent factors (topics), as opposed to just one. This increased representational power comes at the cost of a more challenging unsupervised learning problem of estimating the topic-word distributions when only words are observed, and the topics are hidden. This work provides a simple and efficient learning procedure that is guaranteed to recover the parameters for a wide class of topic models, including Latent Dirichlet Allocation (LDA). For LDA, the procedure correctly recovers both the topic-word distributions and the parameters of the Dirichlet prior over the topic mixtures, using only trigram statistics (i.e., third order moments, which may be estimated with documents containing just three words). The method, called Excess Correlation Analysis, is based on a spectral decomposition of low-order moments via two singular value decompositions (SVDs). Moreover, the algorithm is scalable, since the SVDs are carried out only on k × k matrices, where k is the number of latent factors (topics) and is typically much smaller than the dimension of the observation (word) space. 1
4 0.17633583 354 nips-2012-Truly Nonparametric Online Variational Inference for Hierarchical Dirichlet Processes
Author: Michael Bryant, Erik B. Sudderth
Abstract: Variational methods provide a computationally scalable alternative to Monte Carlo methods for large-scale, Bayesian nonparametric learning. In practice, however, conventional batch and online variational methods quickly become trapped in local optima. In this paper, we consider a nonparametric topic model based on the hierarchical Dirichlet process (HDP), and develop a novel online variational inference algorithm based on split-merge topic updates. We derive a simpler and faster variational approximation of the HDP, and show that by intelligently splitting and merging components of the variational posterior, we can achieve substantially better predictions of test data than conventional online and batch variational algorithms. For streaming analysis of large datasets where batch analysis is infeasible, we show that our split-merge updates better capture the nonparametric properties of the underlying model, allowing continual learning of new topics.
5 0.16471276 274 nips-2012-Priors for Diversity in Generative Latent Variable Models
Author: James T. Kwok, Ryan P. Adams
Abstract: Probabilistic latent variable models are one of the cornerstones of machine learning. They offer a convenient and coherent way to specify prior distributions over unobserved structure in data, so that these unknown properties can be inferred via posterior inference. Such models are useful for exploratory analysis and visualization, for building density models of data, and for providing features that can be used for later discriminative tasks. A significant limitation of these models, however, is that draws from the prior are often highly redundant due to i.i.d. assumptions on internal parameters. For example, there is no preference in the prior of a mixture model to make components non-overlapping, or in topic model to ensure that co-occurring words only appear in a small number of topics. In this work, we revisit these independence assumptions for probabilistic latent variable models, replacing the underlying i.i.d. prior with a determinantal point process (DPP). The DPP allows us to specify a preference for diversity in our latent variables using a positive definite kernel function. Using a kernel between probability distributions, we are able to define a DPP on probability measures. We show how to perform MAP inference with DPP priors in latent Dirichlet allocation and in mixture models, leading to better intuition for the latent variable representation and quantitatively improved unsupervised feature extraction, without compromising the generative aspects of the model. 1
6 0.16326125 332 nips-2012-Symmetric Correspondence Topic Models for Multilingual Text Analysis
7 0.16028675 77 nips-2012-Complex Inference in Neural Circuits with Probabilistic Population Codes and Topic Models
8 0.1599406 220 nips-2012-Monte Carlo Methods for Maximum Margin Supervised Topic Models
9 0.15456791 166 nips-2012-Joint Modeling of a Matrix with Associated Text via Latent Binary Features
10 0.15300331 355 nips-2012-Truncation-free Online Variational Inference for Bayesian Nonparametric Models
11 0.12249888 172 nips-2012-Latent Graphical Model Selection: Efficient Methods for Locally Tree-like Graphs
12 0.11541519 47 nips-2012-Augment-and-Conquer Negative Binomial Processes
13 0.11392244 316 nips-2012-Small-Variance Asymptotics for Exponential Family Dirichlet Process Mixture Models
14 0.10166517 126 nips-2012-FastEx: Hash Clustering with Exponential Families
15 0.091757804 345 nips-2012-Topic-Partitioned Multinetwork Embeddings
16 0.085782617 192 nips-2012-Learning the Dependency Structure of Latent Factors
17 0.083199114 89 nips-2012-Coupling Nonparametric Mixtures via Latent Dirichlet Processes
18 0.078218438 44 nips-2012-Approximating Concavely Parameterized Optimization Problems
19 0.076899014 258 nips-2012-Online L1-Dictionary Learning with Application to Novel Document Detection
20 0.072983406 356 nips-2012-Unsupervised Structure Discovery for Semantic Analysis of Audio
topicId topicWeight
[(0, 0.204), (1, 0.069), (2, -0.06), (3, -0.004), (4, -0.26), (5, -0.058), (6, -0.015), (7, -0.011), (8, 0.141), (9, -0.074), (10, 0.254), (11, 0.235), (12, 0.043), (13, 0.013), (14, 0.051), (15, 0.093), (16, 0.09), (17, 0.108), (18, 0.044), (19, 0.06), (20, 0.018), (21, 0.07), (22, -0.064), (23, -0.086), (24, 0.06), (25, -0.057), (26, 0.053), (27, 0.035), (28, 0.094), (29, -0.065), (30, 0.121), (31, -0.037), (32, 0.02), (33, 0.051), (34, 0.044), (35, -0.006), (36, -0.071), (37, 0.068), (38, 0.063), (39, -0.057), (40, 0.018), (41, 0.015), (42, 0.005), (43, 0.018), (44, -0.037), (45, 0.041), (46, -0.044), (47, 0.017), (48, -0.062), (49, -0.002)]
simIndex simValue paperId paperTitle
same-paper 1 0.97035038 124 nips-2012-Factorial LDA: Sparse Multi-Dimensional Text Models
Author: Michael Paul, Mark Dredze
Abstract: Latent variable models can be enriched with a multi-dimensional structure to consider the many latent factors in a text corpus, such as topic, author perspective and sentiment. We introduce factorial LDA, a multi-dimensional model in which a document is influenced by K different factors, and each word token depends on a K-dimensional vector of latent variables. Our model incorporates structured word priors and learns a sparse product of factors. Experiments on research abstracts show that our model can learn latent factors such as research topic, scientific discipline, and focus (methods vs. applications). Our modeling improvements reduce test perplexity and improve human interpretability of the discovered factors. 1
2 0.94422764 12 nips-2012-A Neural Autoregressive Topic Model
Author: Hugo Larochelle, Stanislas Lauly
Abstract: We describe a new model for learning meaningful representations of text documents from an unlabeled collection of documents. This model is inspired by the recently proposed Replicated Softmax, an undirected graphical model of word counts that was shown to learn a better generative model and more meaningful document representations. Specifically, we take inspiration from the conditional mean-field recursive equations of the Replicated Softmax in order to define a neural network architecture that estimates the probability of observing a new word in a given document given the previously observed words. This paradigm also allows us to replace the expensive softmax distribution over words with a hierarchical distribution over paths in a binary tree of words. The end result is a model whose training complexity scales logarithmically with the vocabulary size instead of linearly as in the Replicated Softmax. Our experiments show that our model is competitive both as a generative model of documents and as a document representation learning algorithm. 1
3 0.9190703 332 nips-2012-Symmetric Correspondence Topic Models for Multilingual Text Analysis
Author: Kosuke Fukumasu, Koji Eguchi, Eric P. Xing
Abstract: Topic modeling is a widely used approach to analyzing large text collections. A small number of multilingual topic models have recently been explored to discover latent topics among parallel or comparable documents, such as in Wikipedia. Other topic models that were originally proposed for structured data are also applicable to multilingual documents. Correspondence Latent Dirichlet Allocation (CorrLDA) is one such model; however, it requires a pivot language to be specified in advance. We propose a new topic model, Symmetric Correspondence LDA (SymCorrLDA), that incorporates a hidden variable to control a pivot language, in an extension of CorrLDA. We experimented with two multilingual comparable datasets extracted from Wikipedia and demonstrate that SymCorrLDA is more effective than some other existing multilingual topic models. 1
4 0.82095915 19 nips-2012-A Spectral Algorithm for Latent Dirichlet Allocation
Author: Anima Anandkumar, Yi-kai Liu, Daniel J. Hsu, Dean P. Foster, Sham M. Kakade
Abstract: Topic modeling is a generalization of clustering that posits that observations (words in a document) are generated by multiple latent factors (topics), as opposed to just one. This increased representational power comes at the cost of a more challenging unsupervised learning problem of estimating the topic-word distributions when only words are observed, and the topics are hidden. This work provides a simple and efficient learning procedure that is guaranteed to recover the parameters for a wide class of topic models, including Latent Dirichlet Allocation (LDA). For LDA, the procedure correctly recovers both the topic-word distributions and the parameters of the Dirichlet prior over the topic mixtures, using only trigram statistics (i.e., third order moments, which may be estimated with documents containing just three words). The method, called Excess Correlation Analysis, is based on a spectral decomposition of low-order moments via two singular value decompositions (SVDs). Moreover, the algorithm is scalable, since the SVDs are carried out only on k × k matrices, where k is the number of latent factors (topics) and is typically much smaller than the dimension of the observation (word) space. 1
5 0.78734928 166 nips-2012-Joint Modeling of a Matrix with Associated Text via Latent Binary Features
Author: Xianxing Zhang, Lawrence Carin
Abstract: A new methodology is developed for joint analysis of a matrix and accompanying documents, with the documents associated with the matrix rows/columns. The documents are modeled with a focused topic model, inferring interpretable latent binary features for each document. A new matrix decomposition is developed, with latent binary features associated with the rows/columns, and with imposition of a low-rank constraint. The matrix decomposition and topic model are coupled by sharing the latent binary feature vectors associated with each. The model is applied to roll-call data, with the associated documents defined by the legislation. Advantages of the proposed model are demonstrated for prediction of votes on a new piece of legislation, based only on the observed text of legislation. The coupling of the text and legislation is also shown to yield insight into the properties of the matrix decomposition for roll-call data. 1
6 0.74274093 220 nips-2012-Monte Carlo Methods for Maximum Margin Supervised Topic Models
7 0.73270404 345 nips-2012-Topic-Partitioned Multinetwork Embeddings
8 0.71599859 274 nips-2012-Priors for Diversity in Generative Latent Variable Models
9 0.63092971 354 nips-2012-Truly Nonparametric Online Variational Inference for Hierarchical Dirichlet Processes
10 0.62524533 154 nips-2012-How They Vote: Issue-Adjusted Models of Legislative Behavior
11 0.61059058 77 nips-2012-Complex Inference in Neural Circuits with Probabilistic Population Codes and Topic Models
12 0.57018834 47 nips-2012-Augment-and-Conquer Negative Binomial Processes
13 0.56832296 89 nips-2012-Coupling Nonparametric Mixtures via Latent Dirichlet Processes
14 0.54942173 192 nips-2012-Learning the Dependency Structure of Latent Factors
15 0.53262109 355 nips-2012-Truncation-free Online Variational Inference for Bayesian Nonparametric Models
16 0.48074582 52 nips-2012-Bayesian Nonparametric Modeling of Suicide Attempts
17 0.47308952 22 nips-2012-A latent factor model for highly multi-relational data
18 0.44229287 172 nips-2012-Latent Graphical Model Selection: Efficient Methods for Locally Tree-like Graphs
19 0.4237031 356 nips-2012-Unsupervised Structure Discovery for Semantic Analysis of Audio
20 0.40076455 258 nips-2012-Online L1-Dictionary Learning with Application to Novel Document Detection
topicId topicWeight
[(0, 0.463), (17, 0.015), (21, 0.019), (38, 0.087), (39, 0.018), (42, 0.015), (53, 0.011), (54, 0.02), (55, 0.024), (74, 0.028), (76, 0.104), (80, 0.067), (92, 0.053)]
simIndex simValue paperId paperTitle
same-paper 1 0.88957423 124 nips-2012-Factorial LDA: Sparse Multi-Dimensional Text Models
Author: Michael Paul, Mark Dredze
Abstract: Latent variable models can be enriched with a multi-dimensional structure to consider the many latent factors in a text corpus, such as topic, author perspective and sentiment. We introduce factorial LDA, a multi-dimensional model in which a document is influenced by K different factors, and each word token depends on a K-dimensional vector of latent variables. Our model incorporates structured word priors and learns a sparse product of factors. Experiments on research abstracts show that our model can learn latent factors such as research topic, scientific discipline, and focus (methods vs. applications). Our modeling improvements reduce test perplexity and improve human interpretability of the discovered factors. 1
2 0.84586668 191 nips-2012-Learning the Architecture of Sum-Product Networks Using Clustering on Variables
Author: Aaron Dennis, Dan Ventura
Abstract: The sum-product network (SPN) is a recently-proposed deep model consisting of a network of sum and product nodes, and has been shown to be competitive with state-of-the-art deep models on certain difficult tasks such as image completion. Designing an SPN network architecture that is suitable for the task at hand is an open question. We propose an algorithm for learning the SPN architecture from data. The idea is to cluster variables (as opposed to data instances) in order to identify variable subsets that strongly interact with one another. Nodes in the SPN network are then allocated towards explaining these interactions. Experimental evidence shows that learning the SPN architecture significantly improves its performance compared to using a previously-proposed static architecture. 1
3 0.78718644 233 nips-2012-Multiresolution Gaussian Processes
Author: David B. Dunson, Emily B. Fox
Abstract: We propose a multiresolution Gaussian process to capture long-range, nonMarkovian dependencies while allowing for abrupt changes and non-stationarity. The multiresolution GP hierarchically couples a collection of smooth GPs, each defined over an element of a random nested partition. Long-range dependencies are captured by the top-level GP while the partition points define the abrupt changes. Due to the inherent conjugacy of the GPs, one can analytically marginalize the GPs and compute the marginal likelihood of the observations given the partition tree. This property allows for efficient inference of the partition itself, for which we employ graph-theoretic techniques. We apply the multiresolution GP to the analysis of magnetoencephalography (MEG) recordings of brain activity.
4 0.78685933 270 nips-2012-Phoneme Classification using Constrained Variational Gaussian Process Dynamical System
Author: Hyunsin Park, Sungrack Yun, Sanghyuk Park, Jongmin Kim, Chang D. Yoo
Abstract: For phoneme classification, this paper describes an acoustic model based on the variational Gaussian process dynamical system (VGPDS). The nonlinear and nonparametric acoustic model is adopted to overcome the limitations of classical hidden Markov models (HMMs) in modeling speech. The Gaussian process prior on the dynamics and emission functions respectively enable the complex dynamic structure and long-range dependency of speech to be better represented than that by an HMM. In addition, a variance constraint in the VGPDS is introduced to eliminate the sparse approximation error in the kernel matrix. The effectiveness of the proposed model is demonstrated with three experimental results, including parameter estimation and classification performance, on the synthetic and benchmark datasets. 1
5 0.77801251 282 nips-2012-Proximal Newton-type methods for convex optimization
Author: Jason Lee, Yuekai Sun, Michael Saunders
Abstract: We seek to solve convex optimization problems in composite form: minimize f (x) := g(x) + h(x), n x∈R where g is convex and continuously differentiable and h : Rn → R is a convex but not necessarily differentiable function whose proximal mapping can be evaluated efficiently. We derive a generalization of Newton-type methods to handle such convex but nonsmooth objective functions. We prove such methods are globally convergent and achieve superlinear rates of convergence in the vicinity of an optimal solution. We also demonstrate the performance of these methods using problems of relevance in machine learning and statistics. 1
6 0.77150255 192 nips-2012-Learning the Dependency Structure of Latent Factors
7 0.72267616 7 nips-2012-A Divide-and-Conquer Method for Sparse Inverse Covariance Estimation
8 0.67714512 12 nips-2012-A Neural Autoregressive Topic Model
9 0.67460591 332 nips-2012-Symmetric Correspondence Topic Models for Multilingual Text Analysis
10 0.60281444 342 nips-2012-The variational hierarchical EM algorithm for clustering hidden Markov models
11 0.55477083 166 nips-2012-Joint Modeling of a Matrix with Associated Text via Latent Binary Features
12 0.55013138 354 nips-2012-Truly Nonparametric Online Variational Inference for Hierarchical Dirichlet Processes
13 0.54266268 47 nips-2012-Augment-and-Conquer Negative Binomial Processes
14 0.53666168 19 nips-2012-A Spectral Algorithm for Latent Dirichlet Allocation
15 0.53455091 78 nips-2012-Compressive Sensing MRI with Wavelet Tree Sparsity
16 0.53326744 72 nips-2012-Cocktail Party Processing via Structured Prediction
17 0.52820408 150 nips-2012-Hierarchical spike coding of sound
18 0.5204609 345 nips-2012-Topic-Partitioned Multinetwork Embeddings
19 0.51963401 104 nips-2012-Dual-Space Analysis of the Sparse Linear Model
20 0.51563776 172 nips-2012-Latent Graphical Model Selection: Efficient Methods for Locally Tree-like Graphs