nips nips2008 nips2008-229 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Jordan L. Boyd-graber, David M. Blei
Abstract: We develop the syntactic topic model (STM), a nonparametric Bayesian model of parsed documents. The STM generates words that are both thematically and syntactically constrained, which combines the semantic insights of topic models with the syntactic information available from parse trees. Each word of a sentence is generated by a distribution that combines document-specific topic weights and parse-tree-specific syntactic transitions. Words are assumed to be generated in an order that respects the parse tree. We derive an approximate posterior inference method based on variational methods for hierarchical Dirichlet processes, and we report qualitative and quantitative results on both synthetic data and hand-parsed documents. 1
Reference: text
sentIndex sentText sentNum sentScore
1 edu Abstract We develop the syntactic topic model (STM), a nonparametric Bayesian model of parsed documents. [sent-5, score-0.822]
2 The STM generates words that are both thematically and syntactically constrained, which combines the semantic insights of topic models with the syntactic information available from parse trees. [sent-6, score-1.259]
3 Each word of a sentence is generated by a distribution that combines document-specific topic weights and parse-tree-specific syntactic transitions. [sent-7, score-0.994]
4 1 Introduction Probabilistic topic models provide a suite of algorithms for finding low dimensional structure in a corpus of documents. [sent-10, score-0.528]
5 In a topic model, the words of each document are assumed to be exchangeable; their probability is invariant to permutation. [sent-14, score-0.758]
6 While useful for classification or information retrieval, where a coarse statistical footprint of the themes of a document is sufficient for success, exchangeable word models are ill-equipped for problems relying on more fine-grained qualities of language. [sent-17, score-0.502]
7 For instance, although a topic model can suggest documents relevant to a query, it cannot find particularly relevant phrases for question answering. [sent-18, score-0.541]
8 Similarly, while a topic model might discover a pattern such as “eat” occurring with “cheesecake,” it lacks the representation to describe selectional preferences, the process where certain words restrict the choice of the words that follow. [sent-19, score-0.658]
9 It is in this spirit that we develop the syntactic topic model, a nonparametric Bayesian topic model that can infer both syntactically and thematically coherent topics. [sent-20, score-1.393]
10 Rather than treating words as the exchangeable unit within a document, the words of the sentences must conform to the structure of a parse tree. [sent-21, score-0.484]
11 In the generative process, the words arise from a distribution that has both a documentspecific thematic component and a parse-tree-specific syntactic component. [sent-22, score-0.516]
12 ” Both the low-level syntactic context of a word and the near future, you could find yourself in its document context constrain the possibilities of the word that can appear next. [sent-25, score-0.757]
13 These words’ topics are chosen by the topic of their parent (as encoded by the tree), the topic weights for a document θ, and the node’s parent’s successor weights π. [sent-27, score-1.458]
14 ) The structure of variables for sentences within the document plate is on the right, as demonstrated by an automatic parse of the sentence “Some phrases laid in his mind for years. [sent-29, score-0.554]
15 ” The STM assumes that the tree structure and words are given, but the latent topics z are not. [sent-30, score-0.453]
16 Previous efforts to capture local syntactic context include semantic space models [6] and similarity functions derived from dependency parses [7]. [sent-34, score-0.391]
17 These methods successfully determine words that share similar contexts, but do not account for thematic consistency. [sent-35, score-0.218]
18 , a representation of whether a document is about sports or animals, the meaning of such terms can be distinguished. [sent-39, score-0.203]
19 Other techniques have attempted to combine local context with document coherence using linear sequence models [8, 9]. [sent-40, score-0.26]
20 While these models are powerful, ordering words sequentially removes the important connections that are preserved in a syntactic parse. [sent-41, score-0.427]
21 Moreover, these models generate words either from the syntactic or thematic context. [sent-42, score-0.542]
22 In the syntactic topic model, words are constrained to be consistent with both. [sent-43, score-0.853]
23 We describe the syntactic topic model, and develop an approximate posterior inference technique based on variational methods. [sent-45, score-0.876]
24 2 The syntactic topic model We describe the syntactic topic model (STM), a document model that combines observed syntactic structure and latent thematic structure. [sent-48, score-2.194]
25 The word that fills in the blank is constrained by its syntactic context and its document context. [sent-51, score-0.629]
26 The syntactic context tells us that it is an object of a preposition, and the document context tells us that it is a travel-related word. [sent-52, score-0.563]
27 It models a document corpus as exchangeable collections of sentences, each of which is associated with a tree structure such as a 2 parse tree (Figure 1(b)). [sent-54, score-0.681]
28 The words of each sentence are assumed to be generated from a distribution influenced both by their observed role in that tree and by the latent topics inherent in the document. [sent-55, score-0.53]
29 The latent variables that comprise the model are topics, topic transition vectors, topic weights, topic assignments, and top-level weights. [sent-56, score-1.449]
30 Each is further associated with a topic transition vector (πk ), which weights changes in topics between parent and child nodes. [sent-58, score-0.798]
31 Topic weights (θd ) are per-document vectors indicating the degree to which each document is “about” each topic. [sent-59, score-0.248]
32 Topic assignments (zn , associated with each internal node of 1(b)) are per-word indicator variables that refer to the topic from which the corresponding word is assumed to be drawn. [sent-60, score-0.573]
33 The number of topics is not fixed, and indeed can grow with the observed data. [sent-62, score-0.211]
34 The STM assumes the following generative process of a document collection. [sent-63, score-0.203]
35 }: (a) Choose topic τk ∼ Dir(σ) (b) Choose topic transition distribution πk ∼ DP(αT , β) 3. [sent-69, score-0.944]
36 M }: (a) Choose topic weights θd ∼ DP(αD , β) (b) For each sentence in the document: i. [sent-73, score-0.574]
37 By merging these vectors, the STM models both the local syntactic context and corpus-level semantics of the words in the documents. [sent-80, score-0.458]
38 Because they depend on their parents, the topic assignments and words are generated by traversing the tree. [sent-81, score-0.579]
39 A natural alternative model would be to traverse the tree and choose the topic assignment from either the parental topic transition πzp(n) or document topic weights θd , based on a binary selector variable. [sent-82, score-1.792]
40 This would be an extension of [8] to parse trees, but it does not enforce words to be syntactically consistent with their parent nodes and be thematically consistent with a topic of the document. [sent-83, score-0.924]
41 Applied to text, the HDP is a probabilistic topic model that allows each document to exhibit multiple topics. [sent-89, score-0.655]
42 It can be thought of as the “infinite” topic version of latent Dirichlet allocation (LDA) [13]. [sent-90, score-0.505]
43 The difference between the STM and the HDP is in how the per-word topic assignment is drawn. [sent-91, score-0.514]
44 In the HDP, this topic assignment is drawn directly from the topic weights and, thus, the HDP assumes that words within a document are exchangeable. [sent-92, score-1.317]
45 In the STM, the words are generated conditioned on their parents in the parse tree. [sent-93, score-0.263]
46 The infinite tree models syntax by basing the latent syntactic category of children on the syntactic category of the parent. [sent-96, score-0.796]
47 3 Approximate posterior inference The central computational problem in topic modeling is to compute the posterior distribution of the latent structure conditioned on an observed collection of documents. [sent-98, score-0.557]
48 Specifically, our goal is to compute the posterior topics, topic transitions, per-document topic weights, per-word topic assign3 ments, and top-level weights conditioned on a set of documents, each of which is a collection of parse trees. [sent-99, score-1.587]
49 In typical topic modeling applications, it is approximated with either variational inference or collapsed Gibbs sampling. [sent-101, score-0.552]
50 Fast Gibbs sampling relies on the conjugacy between the topic assignment and the prior over the distribution that generates it. [sent-102, score-0.543]
51 The syntactic topic model does not enjoy such conjugacy because the topic assignment is drawn from a multiplicative combination of two Dirichlet distributed vectors. [sent-103, score-1.293]
52 In variational inference, the posterior is approximated by positing a simpler family of distributions, indexed by free variational parameters. [sent-105, score-0.226]
53 The variational parameters γd and νz index Dirichlet distributions, and φn is a topic multinomial for the nth word. [sent-110, score-0.585]
54 From this distribution, the Jensen’s lower bound on the log probability of the corpus is L(γ, ν, φ; β, θ, π, τ ) = Eq [log p(β|α) + log p(θ|αD , β) + log p(π|αP , β) + log p(z|θ, π)+ log p(w|z, τ ) + log p(τ |σ)] − Eq [log q(θ) + log q(π) + log q(z)] . [sent-111, score-0.242]
55 Per-word variational updates The variational update for the topic assignment of the nth word is φni ∝ exp Ψ(γi ) − Ψ( + c∈c(n) − c∈c(n) K j=1 K j=1 −1 ωc K j=1 γj ) + φp(n),j Ψ(νj,i ) − Ψ φc,j Ψ(νi,j ) − Ψ K P γj νi,j P j k νi,k k γk K k=1 K k=1 νj,k νi,k + log τi,wn . [sent-115, score-0.835]
56 (3) The influences on estimating the posterior of a topic assignment are: the document’s topic γ, the topic of the node’s parent p(n), the topic of the node’s children c(n), the expected transitions between topics ν, and the probability of the word within a topic τi,wn . [sent-116, score-2.741]
57 Most terms in Equation 3 are familiar from variational inference for probabilistic topic models, as the digamma functions appear in the expectations of multinomial distributions. [sent-117, score-0.585]
58 K k=1 γk k=1 νj,k i=1 j=1 Variational Dirichlet distributions and topic composition This normalizer term also appears in the derivative of the likelihood function for γ and ν (the parameters to the variational distributions on θ and π, respectively), which cannot be solved in a closed form. [sent-120, score-0.585]
59 Verbs were only in the head position; prepositions could appear below nouns or verbs; nouns only appeared below verbs; prepositions or determiners and adjectives could appear below nouns. [sent-131, score-0.791]
60 Each of the parts of speech except for prepositions and determiners were sub-grouped into themes, and a document contains a single theme for each part of speech. [sent-132, score-0.576]
61 For example, a document can only contain nouns from a single “economic,” “academic,” or “livestock” theme. [sent-133, score-0.355]
62 1 The infinite tree model is aware of the tree structure but not documents [14] It is able to separate parts of speech successfully except for adjectives and determiners (Figure 2(a)). [sent-135, score-0.459]
63 The HDP is aware of document groupings and treats the words exchangeably within them [12]. [sent-137, score-0.306]
64 It is able to recover the thematic topics, but has missed the connections between the parts of speech, and has conflated multiple parts of speech (Figure 2(b)). [sent-138, score-0.301]
65 The STM is able to capture the the topical themes and recover parts of speech (with the exception of prepositions that were placed in the same topic as nouns with a self loop). [sent-139, score-1.068]
66 Nouns are dominated by verbs and prepositions, and verbs are the root (head) of sentences. [sent-141, score-0.264]
67 Qualitative description of topics learned from hand-annotated data The same general properties, but with greater variation, are exhibited in real data. [sent-142, score-0.211]
68 We converted the Penn Treebank [10], a corpus of manually curated parse trees, into a dependency parse [18]. [sent-143, score-0.37]
69 Figure 3 shows a subset of topics learned by the STM with truncation level 32. [sent-145, score-0.239]
70 Many of the resulting topics illustrate both syntactic and thematic consistency. [sent-146, score-0.624]
71 A few nonspecific function topics emerged (pronoun, possessive pronoun, general verbs, etc. [sent-147, score-0.211]
72 Thematically related topics are separated by both function and theme. [sent-151, score-0.211]
73 A number of topics in Figure 3(b), such as 17, 15, 10, and 3, appear to some degree in nearly every document, while other topics are used more sparingly to denote specialized content. [sent-153, score-0.48]
74 We also computed perplexity for individual parts of speech to study the differences in predictive power between content words, such as nouns and verbs, and function words, such as prepositions and determiners. [sent-160, score-0.621]
75 We expect function words to be dominated by local context and content words to be determined more by the themes of the document. [sent-162, score-0.373]
76 This trend is seen not only in the synthetic data (Figure 4(a)), where parsing models better predict functional categories like prepositions and document only models fail to account for patterns of verbs and determiners, but also in real data. [sent-163, score-0.738]
77 Figure 4(b) shows that HDP and STM both perform better than parsing models in capturing the patterns behind nouns, while both the STM and the infinite tree have lower perplexity for verbs. [sent-164, score-0.328]
78 Like parsing models, our model was better able to 1 In Figure 2 and Figure 3, we mark topics which represent a single part of speech and are essentially the lone representative of that part of speech in the model. [sent-165, score-0.442]
79 This is a subjective determination of the authors, does not reflect any specialization or special treatment of topics by the model, and is done merely for didactic purposes. [sent-166, score-0.24]
80 33 stupid, that, the, insolent evil, this, the, that (c) Combination of parse transition and document multinomial Figure 2: Three models were fit to the synthetic data described in Section 4. [sent-199, score-0.511]
81 Each box illustrates the top five words of a topic; boxes that represent homogenous parts of speech have rounded edges and are shaded. [sent-200, score-0.237]
82 Edges between topics are labeled with estimates of their transition weight π. [sent-201, score-0.251]
83 While the infinite tree model (a) is able to reconstruct the parts of speech used to generate the data, it lumps all topics into the same categories. [sent-202, score-0.431]
84 Although the HDP (b) can discover themes of recurring words, it cannot determine the interactions between topics or separate out ubiquitous words that occur in all documents. [sent-203, score-0.42]
85 5 Discussion We have introduced and evaluated the syntactic topic model, a nonparametric Bayesian model of parsed documents. [sent-207, score-0.822]
86 The STM achieves better perplexity than the infinite tree or the hierarchical Dirichlet process and uncovers patterns in text that are both syntactically and thematically consistent. [sent-208, score-0.446]
87 For example, recent work [19, 20] in the domain of word sense disambiguation has attempted to combine syntactic similarity with topical information in an ad hoc manner to improve the predominant sense algorithm [21]. [sent-210, score-0.531]
88 The syntactic topic model offers a principled way to learn both simultaneously rather than combining two heterogenous methods. [sent-211, score-0.75]
89 The STM is not a full parsing model, but it could be used as a means of integrating document context into parsing models. [sent-212, score-0.368]
90 The syntactic topic model offers an alternative method of finding more specific rules by grouping words together that appear in similar documents and could be extended to a full parser. [sent-215, score-0.905]
91 52 1 3 5 7 9 12 15 18 21 24 27 30 Topic (a) Sinks and sources (b) Topic usage Figure 3: Selected topics (along with strong links) after a run of the syntactic topic model with a truncation level of 32. [sent-236, score-0.989]
92 As in Figure 2, parts of speech that aren’t subdivided across themes are indicated. [sent-237, score-0.24]
93 In the Treebank corpus (left), head words (verbs) are shared, but the nouns split off into many separate specialized categories before feeding into pronoun sinks. [sent-238, score-0.426]
94 The specialization of topics is also visible in plots of the variational parameter γ normalized for the first 300 documents of the Treebank (right), where three topics columns have been identified. [sent-239, score-0.603]
95 Many topics are used to some extent in every document, showing that they are performing a functional role, while others are used more sparingly for semantic content. [sent-240, score-0.28]
96 The infinite tree captures prepositions best, which have no cross- document variation. [sent-243, score-0.47]
97 On real data 4(b), the syntactic topic model was able to combine the strengths of the infinite tree on functional categories like prepositions with the strengths of the HDP on content categories like nouns to attain lower overall perplexity. [sent-244, score-1.257]
98 While traditional topic models reveal groups of words that are used in similar documents, the STM uncovers groups that are used the same way in similar documents. [sent-245, score-0.607]
99 This decomposition is useful for tasks that require a more fine- grained representation of language than the bag of words can offer or for tasks that require a broader context than parsing models. [sent-246, score-0.235]
100 PUTOP: Turning predominant senses into a topic model for WSD. [sent-387, score-0.51]
wordName wordTfidf (topN-words)
[('topic', 0.452), ('stm', 0.433), ('syntactic', 0.298), ('topics', 0.211), ('document', 0.203), ('hdp', 0.185), ('prepositions', 0.181), ('parse', 0.16), ('nouns', 0.152), ('verbs', 0.132), ('perplexity', 0.124), ('thematic', 0.115), ('themes', 0.106), ('words', 0.103), ('variational', 0.1), ('linguistics', 0.097), ('word', 0.097), ('thematically', 0.087), ('tree', 0.086), ('professor', 0.082), ('speech', 0.082), ('sentence', 0.077), ('syntactically', 0.072), ('exchangeable', 0.07), ('dirichlet', 0.069), ('parsing', 0.067), ('assignment', 0.062), ('association', 0.06), ('evil', 0.058), ('predominant', 0.058), ('determiners', 0.058), ('latent', 0.053), ('treebank', 0.053), ('documents', 0.052), ('parts', 0.052), ('parent', 0.05), ('corpus', 0.05), ('brochure', 0.049), ('pony', 0.049), ('stock', 0.049), ('synthetic', 0.049), ('sentences', 0.048), ('weights', 0.045), ('adjectives', 0.043), ('topical', 0.043), ('pronoun', 0.043), ('noun', 0.043), ('transition', 0.04), ('sheep', 0.04), ('parsed', 0.04), ('zn', 0.037), ('falls', 0.037), ('phrases', 0.037), ('semantic', 0.036), ('disambiguation', 0.035), ('travel', 0.035), ('children', 0.035), ('language', 0.034), ('penn', 0.033), ('multinomial', 0.033), ('chairman', 0.033), ('corp', 0.033), ('earnings', 0.033), ('infinite', 0.033), ('koeling', 0.033), ('normalizer', 0.033), ('ponders', 0.033), ('prep', 0.033), ('quarter', 0.033), ('sparingly', 0.033), ('nonparametric', 0.032), ('princeton', 0.032), ('blei', 0.031), ('context', 0.031), ('content', 0.03), ('proceedings', 0.029), ('specialization', 0.029), ('laid', 0.029), ('preposition', 0.029), ('sales', 0.029), ('climbs', 0.029), ('olden', 0.029), ('conjugacy', 0.029), ('categories', 0.029), ('truncation', 0.028), ('eq', 0.027), ('nite', 0.027), ('mult', 0.026), ('president', 0.026), ('uncovers', 0.026), ('hierarchical', 0.026), ('posterior', 0.026), ('models', 0.026), ('patterns', 0.025), ('combines', 0.025), ('cow', 0.025), ('specialized', 0.025), ('log', 0.024), ('head', 0.024), ('assignments', 0.024)]
simIndex simValue paperId paperTitle
same-paper 1 1.0000008 229 nips-2008-Syntactic Topic Models
Author: Jordan L. Boyd-graber, David M. Blei
Abstract: We develop the syntactic topic model (STM), a nonparametric Bayesian model of parsed documents. The STM generates words that are both thematically and syntactically constrained, which combines the semantic insights of topic models with the syntactic information available from parse trees. Each word of a sentence is generated by a distribution that combines document-specific topic weights and parse-tree-specific syntactic transitions. Words are assumed to be generated in an order that respects the parse tree. We derive an approximate posterior inference method based on variational methods for hierarchical Dirichlet processes, and we report qualitative and quantitative results on both synthetic data and hand-parsed documents. 1
2 0.22346869 114 nips-2008-Large Margin Taxonomy Embedding for Document Categorization
Author: Kilian Q. Weinberger, Olivier Chapelle
Abstract: Applications of multi-class classification, such as document categorization, often appear in cost-sensitive settings. Recent work has significantly improved the state of the art by moving beyond “flat” classification through incorporation of class hierarchies [4]. We present a novel algorithm that goes beyond hierarchical classification and estimates the latent semantic space that underlies the class hierarchy. In this space, each class is represented by a prototype and classification is done with the simple nearest neighbor rule. The optimization of the semantic space incorporates large margin constraints that ensure that for each instance the correct class prototype is closer than any other. We show that our optimization is convex and can be solved efficiently for large data sets. Experiments on the OHSUMED medical journal data base yield state-of-the-art results on topic categorization. 1
3 0.22301522 64 nips-2008-DiscLDA: Discriminative Learning for Dimensionality Reduction and Classification
Author: Simon Lacoste-julien, Fei Sha, Michael I. Jordan
Abstract: Probabilistic topic models have become popular as methods for dimensionality reduction in collections of text documents or images. These models are usually treated as generative models and trained using maximum likelihood or Bayesian methods. In this paper, we discuss an alternative: a discriminative framework in which we assume that supervised side information is present, and in which we wish to take that side information into account in finding a reduced dimensionality representation. Specifically, we present DiscLDA, a discriminative variation on Latent Dirichlet Allocation (LDA) in which a class-dependent linear transformation is introduced on the topic mixture proportions. This parameter is estimated by maximizing the conditional likelihood. By using the transformed topic mixture proportions as a new representation of documents, we obtain a supervised dimensionality reduction algorithm that uncovers the latent structure in a document collection while preserving predictive power for the task of classification. We compare the predictive power of the latent structure of DiscLDA with unsupervised LDA on the 20 Newsgroups document classification task and show how our model can identify shared topics across classes as well as class-dependent topics.
4 0.21088028 197 nips-2008-Relative Performance Guarantees for Approximate Inference in Latent Dirichlet Allocation
Author: Indraneel Mukherjee, David M. Blei
Abstract: Hierarchical probabilistic modeling of discrete data has emerged as a powerful tool for text analysis. Posterior inference in such models is intractable, and practitioners rely on approximate posterior inference methods such as variational inference or Gibbs sampling. There has been much research in designing better approximations, but there is yet little theoretical understanding of which of the available techniques are appropriate, and in which data analysis settings. In this paper we provide the beginnings of such understanding. We analyze the improvement that the recently proposed collapsed variational inference (CVB) provides over mean field variational inference (VB) in latent Dirichlet allocation. We prove that the difference in the tightness of the bound on the likelihood of a document decreases as O(k − 1) + log m/m, where k is the number of topics in the model and m is the number of words in a document. As a consequence, the advantage of CVB over VB is lost for long documents but increases with the number of topics. We demonstrate empirically that the theory holds, using simulated text data and two text corpora. We provide practical guidelines for choosing an approximation. 1
5 0.17620696 127 nips-2008-Logistic Normal Priors for Unsupervised Probabilistic Grammar Induction
Author: Shay B. Cohen, Kevin Gimpel, Noah A. Smith
Abstract: We explore a new Bayesian model for probabilistic grammars, a family of distributions over discrete structures that includes hidden Markov models and probabilistic context-free grammars. Our model extends the correlated topic model framework to probabilistic grammars, exploiting the logistic normal distribution as a prior over the grammar parameters. We derive a variational EM algorithm for that model, and then experiment with the task of unsupervised grammar induction for natural language dependency parsing. We show that our model achieves superior results over previous models that use different priors. 1
6 0.17027469 6 nips-2008-A ``Shape Aware'' Model for semi-supervised Learning of Objects and its Context
7 0.16877989 28 nips-2008-Asynchronous Distributed Learning of Topic Models
8 0.15666133 116 nips-2008-Learning Hybrid Models for Image Annotation with Partially Labeled Data
9 0.15064733 4 nips-2008-A Scalable Hierarchical Distributed Language Model
10 0.14399493 120 nips-2008-Learning the Semantic Correlation: An Alternative Way to Gain from Unlabeled Text
11 0.1271304 246 nips-2008-Unsupervised Learning of Visual Sense Models for Polysemous Words
12 0.12426981 139 nips-2008-Modeling the effects of memory on human online sentence processing with particle filters
13 0.11273027 52 nips-2008-Correlated Bigram LSA for Unsupervised Language Model Adaptation
14 0.10846043 191 nips-2008-Recursive Segmentation and Recognition Templates for 2D Parsing
15 0.1057051 93 nips-2008-Global Ranking Using Continuous Conditional Random Fields
16 0.099070139 35 nips-2008-Bayesian Synchronous Grammar Induction
17 0.07147003 234 nips-2008-The Infinite Factorial Hidden Markov Model
18 0.067011312 154 nips-2008-Nonparametric Bayesian Learning of Switching Linear Dynamical Systems
19 0.063776433 216 nips-2008-Sparse probabilistic projections
20 0.063095219 208 nips-2008-Shared Segmentation of Natural Scenes Using Dependent Pitman-Yor Processes
topicId topicWeight
[(0, -0.17), (1, -0.143), (2, 0.109), (3, -0.235), (4, -0.08), (5, -0.065), (6, 0.165), (7, 0.228), (8, -0.305), (9, -0.018), (10, -0.161), (11, -0.06), (12, -0.031), (13, 0.166), (14, 0.01), (15, 0.092), (16, 0.1), (17, -0.052), (18, -0.153), (19, -0.096), (20, -0.02), (21, 0.023), (22, -0.006), (23, 0.081), (24, 0.061), (25, 0.071), (26, 0.073), (27, -0.042), (28, 0.013), (29, -0.043), (30, 0.045), (31, 0.019), (32, 0.094), (33, 0.052), (34, -0.007), (35, -0.023), (36, 0.011), (37, 0.003), (38, 0.016), (39, -0.012), (40, 0.023), (41, -0.09), (42, -0.028), (43, 0.032), (44, 0.052), (45, 0.015), (46, -0.017), (47, -0.027), (48, 0.032), (49, -0.005)]
simIndex simValue paperId paperTitle
same-paper 1 0.98115975 229 nips-2008-Syntactic Topic Models
Author: Jordan L. Boyd-graber, David M. Blei
Abstract: We develop the syntactic topic model (STM), a nonparametric Bayesian model of parsed documents. The STM generates words that are both thematically and syntactically constrained, which combines the semantic insights of topic models with the syntactic information available from parse trees. Each word of a sentence is generated by a distribution that combines document-specific topic weights and parse-tree-specific syntactic transitions. Words are assumed to be generated in an order that respects the parse tree. We derive an approximate posterior inference method based on variational methods for hierarchical Dirichlet processes, and we report qualitative and quantitative results on both synthetic data and hand-parsed documents. 1
2 0.8402825 197 nips-2008-Relative Performance Guarantees for Approximate Inference in Latent Dirichlet Allocation
Author: Indraneel Mukherjee, David M. Blei
Abstract: Hierarchical probabilistic modeling of discrete data has emerged as a powerful tool for text analysis. Posterior inference in such models is intractable, and practitioners rely on approximate posterior inference methods such as variational inference or Gibbs sampling. There has been much research in designing better approximations, but there is yet little theoretical understanding of which of the available techniques are appropriate, and in which data analysis settings. In this paper we provide the beginnings of such understanding. We analyze the improvement that the recently proposed collapsed variational inference (CVB) provides over mean field variational inference (VB) in latent Dirichlet allocation. We prove that the difference in the tightness of the bound on the likelihood of a document decreases as O(k − 1) + log m/m, where k is the number of topics in the model and m is the number of words in a document. As a consequence, the advantage of CVB over VB is lost for long documents but increases with the number of topics. We demonstrate empirically that the theory holds, using simulated text data and two text corpora. We provide practical guidelines for choosing an approximation. 1
3 0.80250883 28 nips-2008-Asynchronous Distributed Learning of Topic Models
Author: Padhraic Smyth, Max Welling, Arthur U. Asuncion
Abstract: Distributed learning is a problem of fundamental interest in machine learning and cognitive science. In this paper, we present asynchronous distributed learning algorithms for two well-known unsupervised learning frameworks: Latent Dirichlet Allocation (LDA) and Hierarchical Dirichlet Processes (HDP). In the proposed approach, the data are distributed across P processors, and processors independently perform Gibbs sampling on their local data and communicate their information in a local asynchronous manner with other processors. We demonstrate that our asynchronous algorithms are able to learn global topic models that are statistically as accurate as those learned by the standard LDA and HDP samplers, but with significant improvements in computation time and memory. We show speedup results on a 730-million-word text corpus using 32 processors, and we provide perplexity results for up to 1500 virtual processors. As a stepping stone in the development of asynchronous HDP, a parallel HDP sampler is also introduced. 1
4 0.79041654 52 nips-2008-Correlated Bigram LSA for Unsupervised Language Model Adaptation
Author: Yik-cheung Tam, Tanja Schultz
Abstract: We present a correlated bigram LSA approach for unsupervised LM adaptation for automatic speech recognition. The model is trained using efficient variational EM and smoothed using the proposed fractional Kneser-Ney smoothing which handles fractional counts. We address the scalability issue to large training corpora via bootstrapping of bigram LSA from unigram LSA. For LM adaptation, unigram and bigram LSA are integrated into the background N-gram LM via marginal adaptation and linear interpolation respectively. Experimental results on the Mandarin RT04 test set show that applying unigram and bigram LSA together yields 6%–8% relative perplexity reduction and 2.5% relative character error rate reduction which is statistically significant compared to applying only unigram LSA. On the large-scale evaluation on Arabic, 3% relative word error rate reduction is achieved which is also statistically significant. 1
5 0.76463616 64 nips-2008-DiscLDA: Discriminative Learning for Dimensionality Reduction and Classification
Author: Simon Lacoste-julien, Fei Sha, Michael I. Jordan
Abstract: Probabilistic topic models have become popular as methods for dimensionality reduction in collections of text documents or images. These models are usually treated as generative models and trained using maximum likelihood or Bayesian methods. In this paper, we discuss an alternative: a discriminative framework in which we assume that supervised side information is present, and in which we wish to take that side information into account in finding a reduced dimensionality representation. Specifically, we present DiscLDA, a discriminative variation on Latent Dirichlet Allocation (LDA) in which a class-dependent linear transformation is introduced on the topic mixture proportions. This parameter is estimated by maximizing the conditional likelihood. By using the transformed topic mixture proportions as a new representation of documents, we obtain a supervised dimensionality reduction algorithm that uncovers the latent structure in a document collection while preserving predictive power for the task of classification. We compare the predictive power of the latent structure of DiscLDA with unsupervised LDA on the 20 Newsgroups document classification task and show how our model can identify shared topics across classes as well as class-dependent topics.
6 0.62181097 114 nips-2008-Large Margin Taxonomy Embedding for Document Categorization
7 0.6120047 127 nips-2008-Logistic Normal Priors for Unsupervised Probabilistic Grammar Induction
8 0.57987738 4 nips-2008-A Scalable Hierarchical Distributed Language Model
9 0.45949519 246 nips-2008-Unsupervised Learning of Visual Sense Models for Polysemous Words
10 0.43652648 120 nips-2008-Learning the Semantic Correlation: An Alternative Way to Gain from Unlabeled Text
11 0.4320721 35 nips-2008-Bayesian Synchronous Grammar Induction
12 0.41287431 139 nips-2008-Modeling the effects of memory on human online sentence processing with particle filters
13 0.40521273 6 nips-2008-A ``Shape Aware'' Model for semi-supervised Learning of Objects and its Context
14 0.36997664 116 nips-2008-Learning Hybrid Models for Image Annotation with Partially Labeled Data
15 0.33771044 134 nips-2008-Mixed Membership Stochastic Blockmodels
16 0.31207272 93 nips-2008-Global Ranking Using Continuous Conditional Random Fields
17 0.28186113 98 nips-2008-Hierarchical Semi-Markov Conditional Random Fields for Recursive Sequential Data
18 0.25923797 234 nips-2008-The Infinite Factorial Hidden Markov Model
19 0.2516939 208 nips-2008-Shared Segmentation of Natural Scenes Using Dependent Pitman-Yor Processes
20 0.24650691 191 nips-2008-Recursive Segmentation and Recognition Templates for 2D Parsing
topicId topicWeight
[(4, 0.012), (6, 0.036), (7, 0.049), (12, 0.074), (15, 0.015), (28, 0.12), (32, 0.025), (52, 0.283), (57, 0.11), (59, 0.011), (63, 0.022), (77, 0.035), (78, 0.017), (83, 0.079)]
simIndex simValue paperId paperTitle
same-paper 1 0.78922445 229 nips-2008-Syntactic Topic Models
Author: Jordan L. Boyd-graber, David M. Blei
Abstract: We develop the syntactic topic model (STM), a nonparametric Bayesian model of parsed documents. The STM generates words that are both thematically and syntactically constrained, which combines the semantic insights of topic models with the syntactic information available from parse trees. Each word of a sentence is generated by a distribution that combines document-specific topic weights and parse-tree-specific syntactic transitions. Words are assumed to be generated in an order that respects the parse tree. We derive an approximate posterior inference method based on variational methods for hierarchical Dirichlet processes, and we report qualitative and quantitative results on both synthetic data and hand-parsed documents. 1
2 0.63647491 185 nips-2008-Privacy-preserving logistic regression
Author: Kamalika Chaudhuri, Claire Monteleoni
Abstract: This paper addresses the important tradeoff between privacy and learnability, when designing algorithms for learning from private databases. We focus on privacy-preserving logistic regression. First we apply an idea of Dwork et al. [6] to design a privacy-preserving logistic regression algorithm. This involves bounding the sensitivity of regularized logistic regression, and perturbing the learned classifier with noise proportional to the sensitivity. We then provide a privacy-preserving regularized logistic regression algorithm based on a new privacy-preserving technique: solving a perturbed optimization problem. We prove that our algorithm preserves privacy in the model due to [6]. We provide learning guarantees for both algorithms, which are tighter for our new algorithm, in cases in which one would typically apply logistic regression. Experiments demonstrate improved learning performance of our method, versus the sensitivity method. Our privacy-preserving technique does not depend on the sensitivity of the function, and extends easily to a class of convex loss functions. Our work also reveals an interesting connection between regularization and privacy. 1
3 0.62697375 196 nips-2008-Relative Margin Machines
Author: Tony Jebara, Pannagadatta K. Shivaswamy
Abstract: In classification problems, Support Vector Machines maximize the margin of separation between two classes. While the paradigm has been successful, the solution obtained by SVMs is dominated by the directions with large data spread and biased to separate the classes by cutting along large spread directions. This article proposes a novel formulation to overcome such sensitivity and maximizes the margin relative to the spread of the data. The proposed formulation can be efficiently solved and experiments on digit datasets show drastic performance improvements over SVMs. 1
4 0.56255698 116 nips-2008-Learning Hybrid Models for Image Annotation with Partially Labeled Data
Author: Xuming He, Richard S. Zemel
Abstract: Extensive labeled data for image annotation systems, which learn to assign class labels to image regions, is difficult to obtain. We explore a hybrid model framework for utilizing partially labeled data that integrates a generative topic model for image appearance with discriminative label prediction. We propose three alternative formulations for imposing a spatial smoothness prior on the image labels. Tests of the new models and some baseline approaches on three real image datasets demonstrate the effectiveness of incorporating the latent structure. 1
5 0.55689657 95 nips-2008-Grouping Contours Via a Related Image
Author: Praveen Srinivasan, Liming Wang, Jianbo Shi
Abstract: Contours have been established in the biological and computer vision literature as a compact yet descriptive representation of object shape. While individual contours provide structure, they lack the large spatial support of region segments (which lack internal structure). We present a method for further grouping of contours in an image using their relationship to the contours of a second, related image. Stereo, motion, and similarity all provide cues that can aid this task; contours that have similar transformations relating them to their matching contours in the second image likely belong to a single group. To find matches for contours, we rely only on shape, which applies directly to all three modalities without modification, in contrast to the specialized approaches developed for each independently. Visually salient contours are extracted in each image, along with a set of candidate transformations for aligning subsets of them. For each transformation, groups of contours with matching shape across the two images are identified to provide a context for evaluating matches of individual contour points across the images. The resulting contexts of contours are used to perform a final grouping on contours in the original image while simultaneously finding matches in the related image, again by shape matching. We demonstrate grouping results on image pairs consisting of stereo, motion, and similar images. Our method also produces qualitatively better results against a baseline method that does not use the inferred contexts. 1
6 0.55149615 246 nips-2008-Unsupervised Learning of Visual Sense Models for Polysemous Words
7 0.5484153 208 nips-2008-Shared Segmentation of Natural Scenes Using Dependent Pitman-Yor Processes
8 0.54369378 42 nips-2008-Cascaded Classification Models: Combining Models for Holistic Scene Understanding
9 0.54108948 207 nips-2008-Shape-Based Object Localization for Descriptive Classification
10 0.53947651 100 nips-2008-How memory biases affect information transmission: A rational analysis of serial reproduction
11 0.53659582 27 nips-2008-Artificial Olfactory Brain for Mixture Identification
12 0.53576678 66 nips-2008-Dynamic visual attention: searching for coding length increments
13 0.53460395 200 nips-2008-Robust Kernel Principal Component Analysis
14 0.534365 26 nips-2008-Analyzing human feature learning as nonparametric Bayesian inference
15 0.53286195 197 nips-2008-Relative Performance Guarantees for Approximate Inference in Latent Dirichlet Allocation
16 0.53052926 194 nips-2008-Regularized Learning with Networks of Features
17 0.53043246 176 nips-2008-Partially Observed Maximum Entropy Discrimination Markov Networks
18 0.53037322 205 nips-2008-Semi-supervised Learning with Weakly-Related Unlabeled Data : Towards Better Text Categorization
19 0.52991885 119 nips-2008-Learning a discriminative hidden part model for human action recognition
20 0.52768111 63 nips-2008-Dimensionality Reduction for Data in Multiple Feature Representations