nips nips2007 nips2007-105 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Max Welling, Ian Porteous, Evgeniy Bart
Abstract: A general modeling framework is proposed that unifies nonparametric-Bayesian models, topic-models and Bayesian networks. This class of infinite state Bayes nets (ISBN) can be viewed as directed networks of ‘hierarchical Dirichlet processes’ (HDPs) where the domain of the variables can be structured (e.g. words in documents or features in images). We show that collapsed Gibbs sampling can be done efficiently in these models by leveraging the structure of the Bayes net and using the forward-filtering-backward-sampling algorithm for junction trees. Existing models, such as nested-DP, Pachinko allocation, mixed membership stochastic block models as well as a number of new models are described as ISBNs. Two experiments have been performed to illustrate these ideas. 1
Reference: text
sentIndex sentText sentNum sentScore
1 This class of infinite state Bayes nets (ISBN) can be viewed as directed networks of ‘hierarchical Dirichlet processes’ (HDPs) where the domain of the variables can be structured (e. [sent-5, score-0.191]
2 We show that collapsed Gibbs sampling can be done efficiently in these models by leveraging the structure of the Bayes net and using the forward-filtering-backward-sampling algorithm for junction trees. [sent-8, score-0.278]
3 A recent development in this area is a class of Bayes nets known as topic models (e. [sent-13, score-0.198]
4 A recent statistical sophistication of topic models is a nonparametric extension known as HDP [2], which adaptively infers the number of topics based on the available data. [sent-16, score-0.275]
5 We consider models where the variables may have the nested structure of documents and images, may have infinite discrete state spaces, and where the random variables are related through the intuitive causal dependencies of a Bayes net. [sent-19, score-0.35]
6 Inference in these networks is achieved through a two-stage Gibbs sampler, which combines the “forward-filtering-backward-sampling algorithm” [3] extended to junction trees and the direct assignment sampler for HDPs [2]. [sent-21, score-0.182]
7 2 Bayes Net Structure for ISBN Consider observed random variables xA {xa }, a = 1. [sent-22, score-0.093]
8 These variables can take values in an arbitrary domain. [sent-25, score-0.093]
9 In the following we will assume that xa is sampled from a (conditional) distribution in the exponential family. [sent-26, score-0.206]
10 We will also introduce hidden (unobserved, latent) variables {zb }, b = 1. [sent-27, score-0.14]
11 The indices a, b thus index the nodes of the Bayesian network. [sent-30, score-0.117]
12 however also be interested in more structured data, such as words in documents, where the index n can be decomposed into e. [sent-42, score-0.122]
13 In this notation we think of j as labelling a document and ij as labelling a word in document j. [sent-45, score-0.23]
14 The unobserved structure is labelled with the discrete “assignment variables” za which assign the object indexed by n to latent groups (a. [sent-53, score-0.331]
15 The assignment variables z together with the observed variables x are organized into a Bayes net, where dependencies are encoded by the usual “conditional probability tables” (CPT), which we denote with φaa |℘a for observed variables and π b b |℘b for latent variables1 . [sent-57, score-0.413]
16 Here, ℘a denotes the x z joint state of all the parent variables of xa or zb . [sent-58, score-0.766]
17 When a vertical bar is present we normalize over the variables to the left of it, e. [sent-59, score-0.093]
18 Note that CPTs are considered random x variables and may themselves be indexed by (a subset of) n, e. [sent-62, score-0.144]
19 π zb |℘b ∼ D[αb τ zb ] independently and identically for all values of ℘b . [sent-67, score-0.598]
20 The distribution τ itself is Dirichlet distributed, τ za ∼ D[γ a /K a ], where K a is the number of states for variable za . [sent-68, score-0.256]
21 We can put gamma priors on αa , γ a and consider them as random variables as well, but to keep things simple we will consider them fixed variables here. [sent-69, score-0.186]
22 However, it is always possible to infer the number of times variables in the BN are replicated by looking at its indices. [sent-73, score-0.093]
23 Here φ is a distribution over words, one for each topic value z, and is often referred to a “topic distribution”. [sent-78, score-0.157]
24 Topic values are generated from a document specific distribution π which in turn are generated from a “mother distribution” over topics τ . [sent-79, score-0.164]
25 One of the key features of the HDP is that topics are shared across all documents indexed by j. [sent-87, score-0.281]
26 We could add this node to the BN as a parent of z (since π depends on it) and reinterpret the statement of sharing topics as a fully connected transition matrix between states of ι to states of z. [sent-91, score-0.386]
27 This idea can be extended to a combination of fully observed parent variables and multiple unobserved parent variables, e. [sent-92, score-0.356]
28 Moreover, the child variables do not have to be observed either, so we can also replace x → z. [sent-95, score-0.093]
29 In this fashion we can connect together multiple vertical stacks τ → φ → z where each such module is part of a “virtual-HDP” where the joint child states act as virtual data and the joint parent states act as virtual document labels. [sent-96, score-0.323]
30 1a (infinite extension of a Bayes net with IID data items) and 3a (infinite extension of Pachinko Allocation). [sent-98, score-0.101]
31 In this case data is unstructured, assumed IID and indexed by a flat index n = i. [sent-101, score-0.127]
32 There is a considerable body of empirical evidence which confirms that marginalizing out the variables π, φ will result in improved inference (e. [sent-103, score-0.093]
33 In this collapsed space, we sample two sets of variables alternatingly, {z} on the hand and {τ } on the other. [sent-106, score-0.154]
34 We then draw a number Nk|l Bernoulli random variables with probability of success given by the t elements of v, which we call3 st and compute their sum across t: Sk|l = t sk|l . [sent-112, score-0.093]
35 This prok|l cedure is equivalent to running a Chinese restaurant process (CRP) with Nk|l customers and only keeping track of how many tables become occupied. [sent-113, score-0.168]
36 If the state corresponding to γ is picked, a new state is created and we increment K a ← K a + 1. [sent-119, score-0.114]
37 E[π zb |℘b ] = (α τ b + Nzb |℘b )/(αb + N℘b ) and similarly for φ. [sent-122, score-0.299]
38 z 3 These variables are so called auxiliary variables to facilitate sampling τ . [sent-123, score-0.292]
39 b 3 z1 1 3 2 1 2 z2 x |z 1 z 1|z 2 j z 2|z 3 j x ji 1 z ji 3 2 z ji 2 1 z3 z 3|j x |z 1 z 1|z 2 z 2|z 3 xn zn1 zn2 3 z ji (a) 3 z3 zn3 (b) Figure 3: Graphical representation for (a) Pachinko Allocation and (b) Nested DP. [sent-124, score-0.964]
40 This will allow assignment variables to add or remove states adaptively4 . [sent-126, score-0.228]
41 Hence, we can use the structure of the Bayes net to sample the assignment variables jointly across the BN (for data-case i). [sent-129, score-0.311]
42 5 ISBN for Structured Data In section 2 we introduced an index n to label the known structure of the data. [sent-137, score-0.076]
43 1c where n = (kji) is labelling for instance words (i) in chapters (j) in books (k). [sent-143, score-0.228]
44 The first level CPT π z|kj is specific to chapters (and hence books) and is sampled from a Dirichlet distribution with mean given 4 We adopted the name ‘infinite state Bayesian network’ because the (K a + 1)th state actually represents an infinite pool of indistinguishable states. [sent-144, score-0.189]
45 To sample from ρ, τ we compute counts Nu|m,jk which is the number of times words were assigned in chapter j and book k to the joint state z = u, ℘ = m. [sent-147, score-0.103]
46 We then work our way up the stack, sampling new count arrays S, R as we go, and then down again sampling the CPTs (τ , ρ) using these count arrays5 . [sent-148, score-0.132]
47 If all z variables carry the same index n, sampling zn given the hierarchical priors is very similar to the FFBS procedure described in the previous section, except that the count arrays may carry ¬ijk a subset of the indices from n, e. [sent-151, score-0.331]
48 If two neighboring z variables carry different subsets of n labels, e. [sent-155, score-0.093]
49 The general rule is to identify and remove all z variables that are impacted by changing the value for z under consideration, e. [sent-159, score-0.143]
50 To j jw jf compute the conditional probability we set z = k and add the impacted variables z back into the system, one-by-one in an arbitrary order and assigning them to their old values. [sent-163, score-0.277]
51 In taking the infinite limit the conditional distribution for existing states zb becomes directly proportional to Nzb |℘b (the αb τ zb term is missing). [sent-165, score-0.656]
52 This has the effect that a new state z b = k that was discovered for some parent state ℘b = l will not be available to other parent states, simply because Nk|l = 0, l = l. [sent-166, score-0.336]
53 A DP prior is certainly appropriate for nodes zb with CPTs that do not depend on other parents or additional labels, e. [sent-172, score-0.299]
54 It consists of a single topic node and a single observation node. [sent-180, score-0.198]
55 If we make φ depend on j instead and add a prior: ψ x|z → φx|z,j , we obtain an “HDP with random effects” [12] which has the benefit that shared topics across documents can vary slightly relative to each other. [sent-184, score-0.23]
56 Example: Infinite State Chains The ‘Pachinko allocation model’ (PAM) [13] consists of a linear chain of assignment variables with document specific transition probabilities, see Fig. [sent-185, score-0.267]
57 Here, images are modeled as mixtures over parts and parts were modeled as mixtures over visual words. [sent-191, score-0.171]
58 Finally, a visual word is a distribution over features. [sent-192, score-0.097]
59 2a has a data variable xji and two parent topic variables z1 and z2 . [sent-196, score-0.411]
60 One can think of j as the customer index and i as the product index (and no IID ji ji repeated index). [sent-197, score-0.783]
61 The value of x is the rating of that customer for that product. [sent-198, score-0.149]
62 The hidden variables 5 Teh’s code npbayes-r21, (available from his web-site) does in fact implement this sampling process. [sent-199, score-0.206]
63 5 z1 and z2 represent product groups and customer groups. [sent-200, score-0.232]
64 Every data entry is assigned to both a ji ji customer group and a product group which together determine the factor from which we sample the rating. [sent-201, score-0.753]
65 Note that the difference between the assignment variables is that their corresponding CPTs π z1 ,j and π z2 ,i depend on j and i respectively. [sent-202, score-0.17]
66 Also, single topics can be generalized to hierarchies of topics, so every branch becomes a PAM. [sent-207, score-0.161]
67 Note that for unobserved xji values (not all products have been rated by all customers) the corresponding za , zb are “dangling” and can be integrated out. [sent-208, score-0.489]
68 The result is that we should skip that variable in ji ji the Gibbs sampler. [sent-209, score-0.482]
69 The main difference with HDP is that (like BiHDP) π depends on two parent states zi→j and zj→i by which we mean that item i has chosen topic zi→j to interact with item j and vice versa. [sent-212, score-0.612]
70 However, (unlike BiHDP) those topic states share a common distribution π. [sent-213, score-0.215]
71 The hidden variables jointly label the type of interaction that was used to generate ‘matrix-element’ xij . [sent-216, score-0.18]
72 2c has two observed variables and an assignment variable that is not repeated over items. [sent-219, score-0.17]
73 The left branch can then model words on the web-page while the right branch can model visual features on the web-page. [sent-221, score-0.183]
74 Market Basket Data In this experiment we investigate the performance of BiHDP on a synthetic market basket dataset. [sent-226, score-0.199]
75 The generated data consists of purchases from simulated groups of customers who have similar buying habits. [sent-229, score-0.239]
76 Similarity of buying habits refers to the fact that customers within a group buy similar groups of items. [sent-230, score-0.3]
77 For example, items like strawberries and cream are likely to be in the same item group and thus are likely to be purchased together in the same market basket. [sent-231, score-0.407]
78 The following parameters were used to generate data for our experiments: 1M transactions, 10K customers, 1K different items, 4 items per transaction on average, 4 item groups per customer group on average, 50 market basket patterns, 50 customer patterns. [sent-232, score-0.888]
79 The two assignment variables correspond to customers and items respectively. [sent-234, score-0.39]
80 For a given pair of customer and item groups, a binomial distribution was used to model the probability of a customer group making a purchase from that item group. [sent-235, score-0.645]
81 A collapsed Gibbs sampler was used to fit the model. [sent-236, score-0.116]
82 After 1000 epochs the system converged to 278 customer groups and 39 item factors. [sent-237, score-0.375]
83 As can be seen, most item groups correspond directly to the hidden ground truth data. [sent-240, score-0.273]
84 The visual vocabulary is usually determined in a preprocessing step where k-means is run to cluster features collected from the training set. [sent-243, score-0.094]
85 In [15] a different approach was proposed in which the visual word vocabulary was learned jointly with fitting the parameters of the model. [sent-244, score-0.237]
86 This can have the benefit that the vocabulary is better adapted to suit the needs of the model. [sent-245, score-0.086]
87 3a with 2 hidden variables (instead of 3) and π z1 |z2 independent of j. [sent-247, score-0.14]
88 x is modeled as a Gaussian-distributed random variable over feature values, z1 represents the word identity and z2 is the topic index. [sent-248, score-0.203]
89 Ground truth items have no associated weight; therefore, they were ordered to facilitate comparison with the left row. [sent-254, score-0.144]
90 8 1 Figure 5: Precision Recall curves for Caltech-4 dataset (left) and turntable dataset (right). [sent-272, score-0.1]
91 LDA was fit using 500 visual words and 50 parts (which we found to give the best results). [sent-277, score-0.097]
92 The turntable database contains images of 15 toy objects. [sent-278, score-0.14]
93 The objects were placed on a turntable and photographed every 5 degrees. [sent-279, score-0.1]
94 LDA used 15 topics and 200 visual words (which again was optimal). [sent-281, score-0.215]
95 We are not interested in claiming superiority of ISBNs, but rather hope to convey that ISBNs are a convenient tool to design models and to facilitate the search for the number of latent states. [sent-289, score-0.097]
96 Not every topic model naturally fits the suit of an ISBN. [sent-292, score-0.2]
97 Also, in [10, 20] models were studied where a word can be emitted at 7 any node corresponding to a topic variable z. [sent-296, score-0.244]
98 A collapsed variational bayesian inference algorithm for latent dirichlet allocation. [sent-349, score-0.335]
99 Hierarchical topic models and the nested chinese restaurant process. [sent-366, score-0.299]
100 Mining associations between sets of items in massive databases. [sent-405, score-0.104]
wordName wordTfidf (topN-words)
[('zb', 0.299), ('hdp', 0.282), ('ji', 0.241), ('xa', 0.206), ('bihdp', 0.174), ('dirichlet', 0.162), ('topic', 0.157), ('pachinko', 0.152), ('nzb', 0.149), ('pom', 0.149), ('customer', 0.149), ('item', 0.143), ('isbn', 0.139), ('cpts', 0.125), ('bayes', 0.119), ('topics', 0.118), ('customers', 0.116), ('parent', 0.111), ('items', 0.104), ('net', 0.101), ('basket', 0.1), ('cpt', 0.1), ('turntable', 0.1), ('market', 0.099), ('za', 0.099), ('variables', 0.093), ('nk', 0.089), ('groups', 0.083), ('assignment', 0.077), ('index', 0.076), ('lda', 0.076), ('chapters', 0.075), ('isbns', 0.075), ('jf', 0.075), ('jkm', 0.075), ('kji', 0.075), ('gibbs', 0.07), ('su', 0.069), ('iid', 0.067), ('sampling', 0.066), ('hdps', 0.065), ('zi', 0.065), ('sk', 0.065), ('documents', 0.063), ('books', 0.061), ('collapsed', 0.061), ('group', 0.061), ('jw', 0.059), ('states', 0.058), ('nite', 0.058), ('state', 0.057), ('latent', 0.057), ('learned', 0.057), ('bayesian', 0.055), ('sampler', 0.055), ('hierarchical', 0.055), ('teh', 0.054), ('xb', 0.052), ('restaurant', 0.052), ('ru', 0.052), ('allocation', 0.051), ('indexed', 0.051), ('visual', 0.051), ('downstream', 0.05), ('ffbs', 0.05), ('impacted', 0.05), ('nxa', 0.05), ('stacks', 0.05), ('vocabularies', 0.05), ('xji', 0.05), ('junction', 0.05), ('shared', 0.049), ('hidden', 0.047), ('word', 0.046), ('document', 0.046), ('words', 0.046), ('chinese', 0.046), ('aa', 0.046), ('bn', 0.046), ('labelling', 0.046), ('membership', 0.046), ('nested', 0.044), ('suit', 0.043), ('ijk', 0.043), ('pam', 0.043), ('depicted', 0.043), ('vocabulary', 0.043), ('branch', 0.043), ('nets', 0.041), ('indices', 0.041), ('unobserved', 0.041), ('node', 0.041), ('facilitate', 0.04), ('mixtures', 0.04), ('jointly', 0.04), ('images', 0.04), ('buying', 0.04), ('bart', 0.04), ('nu', 0.04), ('modules', 0.04)]
simIndex simValue paperId paperTitle
same-paper 1 1.0000006 105 nips-2007-Infinite State Bayes-Nets for Structured Domains
Author: Max Welling, Ian Porteous, Evgeniy Bart
Abstract: A general modeling framework is proposed that unifies nonparametric-Bayesian models, topic-models and Bayesian networks. This class of infinite state Bayes nets (ISBN) can be viewed as directed networks of ‘hierarchical Dirichlet processes’ (HDPs) where the domain of the variables can be structured (e.g. words in documents or features in images). We show that collapsed Gibbs sampling can be done efficiently in these models by leveraging the structure of the Bayes net and using the forward-filtering-backward-sampling algorithm for junction trees. Existing models, such as nested-DP, Pachinko allocation, mixed membership stochastic block models as well as a number of new models are described as ISBNs. Two experiments have been performed to illustrate these ideas. 1
2 0.24094228 47 nips-2007-Collapsed Variational Inference for HDP
Author: Yee W. Teh, Kenichi Kurihara, Max Welling
Abstract: A wide variety of Dirichlet-multinomial ‘topic’ models have found interesting applications in recent years. While Gibbs sampling remains an important method of inference in such models, variational techniques have certain advantages such as easy assessment of convergence, easy optimization without the need to maintain detailed balance, a bound on the marginal likelihood, and side-stepping of issues with topic-identifiability. The most accurate variational technique thus far, namely collapsed variational latent Dirichlet allocation, did not deal with model selection nor did it include inference for hyperparameters. We address both issues by generalizing the technique, obtaining the first variational algorithm to deal with the hierarchical Dirichlet process and to deal with hyperparameters of Dirichlet variables. Experiments show a significant improvement in accuracy. 1
3 0.19787015 183 nips-2007-Spatial Latent Dirichlet Allocation
Author: Xiaogang Wang, Eric Grimson
Abstract: In recent years, the language model Latent Dirichlet Allocation (LDA), which clusters co-occurring words into topics, has been widely applied in the computer vision field. However, many of these applications have difficulty with modeling the spatial and temporal structure among visual words, since LDA assumes that a document is a “bag-of-words”. It is also critical to properly design “words” and “documents” when using a language model to solve vision problems. In this paper, we propose a topic model Spatial Latent Dirichlet Allocation (SLDA), which better encodes spatial structures among visual words that are essential for solving many vision problems. The spatial information is not encoded in the values of visual words but in the design of documents. Instead of knowing the partition of words into documents a priori, the word-document assignment becomes a random hidden variable in SLDA. There is a generative procedure, where knowledge of spatial structure can be flexibly added as a prior, grouping visual words which are close in space into the same document. We use SLDA to discover objects from a collection of images, and show it achieves better performance than LDA. 1
4 0.19786006 197 nips-2007-The Infinite Markov Model
Author: Daichi Mochihashi, Eiichiro Sumita
Abstract: We present a nonparametric Bayesian method of estimating variable order Markov processes up to a theoretically infinite order. By extending a stick-breaking prior, which is usually defined on a unit interval, “vertically” to the trees of infinite depth associated with a hierarchical Chinese restaurant process, our model directly infers the hidden orders of Markov dependencies from which each symbol originated. Experiments on character and word sequences in natural language showed that the model has a comparative performance with an exponentially large full-order model, while computationally much efficient in both time and space. We expect that this basic model will also extend to the variable order hierarchical clustering of general data. 1
5 0.16923836 73 nips-2007-Distributed Inference for Latent Dirichlet Allocation
Author: David Newman, Padhraic Smyth, Max Welling, Arthur U. Asuncion
Abstract: We investigate the problem of learning a widely-used latent-variable model – the Latent Dirichlet Allocation (LDA) or “topic” model – using distributed computation, where each of processors only sees of the total data set. We propose two distributed inference schemes that are motivated from different perspectives. The first scheme uses local Gibbs sampling on each processor with periodic updates—it is simple to implement and can be viewed as an approximation to a single processor implementation of Gibbs sampling. The second scheme relies on a hierarchical Bayesian extension of the standard LDA model to directly account for the fact that data are distributed across processors—it has a theoretical guarantee of convergence but is more complex to implement than the approximate method. Using five real-world text corpora we show that distributed learning works very well for LDA models, i.e., perplexity and precision-recall scores for distributed learning are indistinguishable from those obtained with single-processor learning. Our extensive experimental results include large-scale distributed computation on 1000 virtual processors; and speedup experiments of learning topics in a 100-million word corpus using 16 processors. ¢ ¤ ¦¥£ ¢ ¢
6 0.15097485 189 nips-2007-Supervised Topic Models
7 0.11979784 95 nips-2007-HM-BiTAM: Bilingual Topic Exploration, Word Alignment, and Translation
8 0.11825617 51 nips-2007-Comparing Bayesian models for multisensory cue combination without mandatory integration
9 0.11333051 129 nips-2007-Mining Internet-Scale Software Repositories
10 0.10486672 92 nips-2007-Fixing Max-Product: Convergent Message Passing Algorithms for MAP LP-Relaxations
11 0.08807876 169 nips-2007-Retrieved context and the discovery of semantic structure
12 0.084829755 97 nips-2007-Hidden Common Cause Relations in Relational Learning
13 0.080043793 143 nips-2007-Object Recognition by Scene Alignment
14 0.074632779 63 nips-2007-Convex Relaxations of Latent Variable Training
15 0.073402472 2 nips-2007-A Bayesian LDA-based model for semi-supervised part-of-speech tagging
16 0.072852843 99 nips-2007-Hierarchical Penalization
17 0.072694808 211 nips-2007-Unsupervised Feature Selection for Accurate Recommendation of High-Dimensional Image Data
18 0.069284901 23 nips-2007-An Analysis of Convex Relaxations for MAP Estimation
19 0.065604456 75 nips-2007-Efficient Bayesian Inference for Dynamically Changing Graphs
20 0.062160015 181 nips-2007-Sparse Overcomplete Latent Variable Decomposition of Counts Data
topicId topicWeight
[(0, -0.228), (1, 0.074), (2, -0.089), (3, -0.329), (4, 0.059), (5, -0.112), (6, 0.054), (7, -0.081), (8, 0.026), (9, 0.024), (10, -0.049), (11, -0.046), (12, -0.009), (13, 0.105), (14, 0.027), (15, 0.04), (16, 0.091), (17, -0.079), (18, 0.031), (19, -0.051), (20, 0.022), (21, 0.042), (22, -0.064), (23, -0.005), (24, 0.087), (25, 0.057), (26, -0.085), (27, 0.071), (28, 0.138), (29, 0.041), (30, 0.072), (31, 0.03), (32, 0.017), (33, 0.008), (34, -0.018), (35, -0.107), (36, -0.032), (37, 0.079), (38, -0.008), (39, -0.019), (40, -0.03), (41, -0.102), (42, -0.059), (43, 0.13), (44, -0.108), (45, -0.019), (46, 0.147), (47, -0.093), (48, 0.086), (49, -0.028)]
simIndex simValue paperId paperTitle
same-paper 1 0.94360381 105 nips-2007-Infinite State Bayes-Nets for Structured Domains
Author: Max Welling, Ian Porteous, Evgeniy Bart
Abstract: A general modeling framework is proposed that unifies nonparametric-Bayesian models, topic-models and Bayesian networks. This class of infinite state Bayes nets (ISBN) can be viewed as directed networks of ‘hierarchical Dirichlet processes’ (HDPs) where the domain of the variables can be structured (e.g. words in documents or features in images). We show that collapsed Gibbs sampling can be done efficiently in these models by leveraging the structure of the Bayes net and using the forward-filtering-backward-sampling algorithm for junction trees. Existing models, such as nested-DP, Pachinko allocation, mixed membership stochastic block models as well as a number of new models are described as ISBNs. Two experiments have been performed to illustrate these ideas. 1
2 0.75111359 47 nips-2007-Collapsed Variational Inference for HDP
Author: Yee W. Teh, Kenichi Kurihara, Max Welling
Abstract: A wide variety of Dirichlet-multinomial ‘topic’ models have found interesting applications in recent years. While Gibbs sampling remains an important method of inference in such models, variational techniques have certain advantages such as easy assessment of convergence, easy optimization without the need to maintain detailed balance, a bound on the marginal likelihood, and side-stepping of issues with topic-identifiability. The most accurate variational technique thus far, namely collapsed variational latent Dirichlet allocation, did not deal with model selection nor did it include inference for hyperparameters. We address both issues by generalizing the technique, obtaining the first variational algorithm to deal with the hierarchical Dirichlet process and to deal with hyperparameters of Dirichlet variables. Experiments show a significant improvement in accuracy. 1
3 0.72000581 73 nips-2007-Distributed Inference for Latent Dirichlet Allocation
Author: David Newman, Padhraic Smyth, Max Welling, Arthur U. Asuncion
Abstract: We investigate the problem of learning a widely-used latent-variable model – the Latent Dirichlet Allocation (LDA) or “topic” model – using distributed computation, where each of processors only sees of the total data set. We propose two distributed inference schemes that are motivated from different perspectives. The first scheme uses local Gibbs sampling on each processor with periodic updates—it is simple to implement and can be viewed as an approximation to a single processor implementation of Gibbs sampling. The second scheme relies on a hierarchical Bayesian extension of the standard LDA model to directly account for the fact that data are distributed across processors—it has a theoretical guarantee of convergence but is more complex to implement than the approximate method. Using five real-world text corpora we show that distributed learning works very well for LDA models, i.e., perplexity and precision-recall scores for distributed learning are indistinguishable from those obtained with single-processor learning. Our extensive experimental results include large-scale distributed computation on 1000 virtual processors; and speedup experiments of learning topics in a 100-million word corpus using 16 processors. ¢ ¤ ¦¥£ ¢ ¢
4 0.66299069 189 nips-2007-Supervised Topic Models
Author: Jon D. Mcauliffe, David M. Blei
Abstract: We introduce supervised latent Dirichlet allocation (sLDA), a statistical model of labelled documents. The model accommodates a variety of response types. We derive a maximum-likelihood procedure for parameter estimation, which relies on variational approximations to handle intractable posterior expectations. Prediction problems motivate this research: we use the fitted model to predict response values for new documents. We test sLDA on two real-world problems: movie ratings predicted from reviews, and web page popularity predicted from text descriptions. We illustrate the benefits of sLDA versus modern regularized regression, as well as versus an unsupervised LDA analysis followed by a separate regression. 1
5 0.61135083 197 nips-2007-The Infinite Markov Model
Author: Daichi Mochihashi, Eiichiro Sumita
Abstract: We present a nonparametric Bayesian method of estimating variable order Markov processes up to a theoretically infinite order. By extending a stick-breaking prior, which is usually defined on a unit interval, “vertically” to the trees of infinite depth associated with a hierarchical Chinese restaurant process, our model directly infers the hidden orders of Markov dependencies from which each symbol originated. Experiments on character and word sequences in natural language showed that the model has a comparative performance with an exponentially large full-order model, while computationally much efficient in both time and space. We expect that this basic model will also extend to the variable order hierarchical clustering of general data. 1
6 0.59796089 183 nips-2007-Spatial Latent Dirichlet Allocation
7 0.45820314 95 nips-2007-HM-BiTAM: Bilingual Topic Exploration, Word Alignment, and Translation
8 0.45210963 131 nips-2007-Modeling homophily and stochastic equivalence in symmetric relational data
9 0.43814856 31 nips-2007-Bayesian Agglomerative Clustering with Coalescents
10 0.43537807 211 nips-2007-Unsupervised Feature Selection for Accurate Recommendation of High-Dimensional Image Data
11 0.42165369 129 nips-2007-Mining Internet-Scale Software Repositories
12 0.39117536 196 nips-2007-The Infinite Gamma-Poisson Feature Model
13 0.37177181 97 nips-2007-Hidden Common Cause Relations in Relational Learning
14 0.36976996 51 nips-2007-Comparing Bayesian models for multisensory cue combination without mandatory integration
15 0.3650724 9 nips-2007-A Probabilistic Approach to Language Change
16 0.35245925 68 nips-2007-Discovering Weakly-Interacting Factors in a Complex Stochastic Process
17 0.35167 99 nips-2007-Hierarchical Penalization
18 0.34202376 158 nips-2007-Probabilistic Matrix Factorization
19 0.33959723 169 nips-2007-Retrieved context and the discovery of semantic structure
20 0.3332552 19 nips-2007-Active Preference Learning with Discrete Choice Data
topicId topicWeight
[(5, 0.043), (13, 0.036), (16, 0.015), (18, 0.012), (21, 0.06), (31, 0.033), (34, 0.019), (35, 0.027), (47, 0.109), (83, 0.096), (85, 0.071), (87, 0.082), (90, 0.056), (98, 0.258)]
simIndex simValue paperId paperTitle
same-paper 1 0.77885985 105 nips-2007-Infinite State Bayes-Nets for Structured Domains
Author: Max Welling, Ian Porteous, Evgeniy Bart
Abstract: A general modeling framework is proposed that unifies nonparametric-Bayesian models, topic-models and Bayesian networks. This class of infinite state Bayes nets (ISBN) can be viewed as directed networks of ‘hierarchical Dirichlet processes’ (HDPs) where the domain of the variables can be structured (e.g. words in documents or features in images). We show that collapsed Gibbs sampling can be done efficiently in these models by leveraging the structure of the Bayes net and using the forward-filtering-backward-sampling algorithm for junction trees. Existing models, such as nested-DP, Pachinko allocation, mixed membership stochastic block models as well as a number of new models are described as ISBNs. Two experiments have been performed to illustrate these ideas. 1
2 0.67175269 82 nips-2007-Estimating divergence functionals and the likelihood ratio by penalized convex risk minimization
Author: Xuanlong Nguyen, Martin J. Wainwright, Michael I. Jordan
Abstract: We develop and analyze an algorithm for nonparametric estimation of divergence functionals and the density ratio of two probability distributions. Our method is based on a variational characterization of f -divergences, which turns the estimation into a penalized convex risk minimization problem. We present a derivation of our kernel-based estimation algorithm and an analysis of convergence rates for the estimator. Our simulation results demonstrate the convergence behavior of the method, which compares favorably with existing methods in the literature. 1
3 0.64820004 76 nips-2007-Efficient Convex Relaxation for Transductive Support Vector Machine
Author: Zenglin Xu, Rong Jin, Jianke Zhu, Irwin King, Michael Lyu
Abstract: We consider the problem of Support Vector Machine transduction, which involves a combinatorial problem with exponential computational complexity in the number of unlabeled examples. Although several studies are devoted to Transductive SVM, they suffer either from the high computation complexity or from the solutions of local optimum. To address this problem, we propose solving Transductive SVM via a convex relaxation, which converts the NP-hard problem to a semi-definite programming. Compared with the other SDP relaxation for Transductive SVM, the proposed algorithm is computationally more efficient with the number of free parameters reduced from O(n2 ) to O(n) where n is the number of examples. Empirical study with several benchmark data sets shows the promising performance of the proposed algorithm in comparison with other state-of-the-art implementations of Transductive SVM. 1
4 0.58242542 59 nips-2007-Continuous Time Particle Filtering for fMRI
Author: Lawrence Murray, Amos J. Storkey
Abstract: We construct a biologically motivated stochastic differential model of the neural and hemodynamic activity underlying the observed Blood Oxygen Level Dependent (BOLD) signal in Functional Magnetic Resonance Imaging (fMRI). The model poses a difficult parameter estimation problem, both theoretically due to the nonlinearity and divergence of the differential system, and computationally due to its time and space complexity. We adapt a particle filter and smoother to the task, and discuss some of the practical approaches used to tackle the difficulties, including use of sparse matrices and parallelisation. Results demonstrate the tractability of the approach in its application to an effective connectivity study. 1
5 0.57195759 189 nips-2007-Supervised Topic Models
Author: Jon D. Mcauliffe, David M. Blei
Abstract: We introduce supervised latent Dirichlet allocation (sLDA), a statistical model of labelled documents. The model accommodates a variety of response types. We derive a maximum-likelihood procedure for parameter estimation, which relies on variational approximations to handle intractable posterior expectations. Prediction problems motivate this research: we use the fitted model to predict response values for new documents. We test sLDA on two real-world problems: movie ratings predicted from reviews, and web page popularity predicted from text descriptions. We illustrate the benefits of sLDA versus modern regularized regression, as well as versus an unsupervised LDA analysis followed by a separate regression. 1
6 0.57156682 73 nips-2007-Distributed Inference for Latent Dirichlet Allocation
7 0.56394988 112 nips-2007-Learning Monotonic Transformations for Classification
8 0.56342131 2 nips-2007-A Bayesian LDA-based model for semi-supervised part-of-speech tagging
9 0.56210333 50 nips-2007-Combined discriminative and generative articulated pose and non-rigid shape estimation
10 0.55919683 77 nips-2007-Efficient Inference for Distributions on Permutations
11 0.55369431 153 nips-2007-People Tracking with the Laplacian Eigenmaps Latent Variable Model
12 0.55181873 93 nips-2007-GRIFT: A graphical model for inferring visual classification features from human data
13 0.5492748 18 nips-2007-A probabilistic model for generating realistic lip movements from speech
14 0.54905128 138 nips-2007-Near-Maximum Entropy Models for Binary Neural Representations of Natural Images
15 0.5477041 63 nips-2007-Convex Relaxations of Latent Variable Training
16 0.54693949 172 nips-2007-Scene Segmentation with CRFs Learned from Partially Labeled Images
17 0.54688394 211 nips-2007-Unsupervised Feature Selection for Accurate Recommendation of High-Dimensional Image Data
18 0.54511589 47 nips-2007-Collapsed Variational Inference for HDP
19 0.54416782 94 nips-2007-Gaussian Process Models for Link Analysis and Transfer Learning
20 0.54248983 86 nips-2007-Exponential Family Predictive Representations of State