nips nips2012 nips2012-47 knowledge-graph by maker-knowledge-mining

47 nips-2012-Augment-and-Conquer Negative Binomial Processes

Source: pdf

Author: Mingyuan Zhou, Lawrence Carin

Abstract: By developing data augmentation methods unique to the negative binomial (NB) distribution, we unite seemingly disjoint count and mixture models under the NB process framework. We develop fundamental properties of the models and derive efﬁcient Gibbs sampling inference. We show that the gamma-NB process can be reduced to the hierarchical Dirichlet process with normalization, highlighting its unique theoretical, structural and computational advantages. A variety of NB processes with distinct sharing mechanisms are constructed and applied to topic modeling, with connections to existing algorithms, showing the importance of inferring both the NB dispersion and probability parameters. 1

Reference: text

Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 edu Abstract By developing data augmentation methods unique to the negative binomial (NB) distribution, we unite seemingly disjoint count and mixture models under the NB process framework. [sent-7, score-0.438]

2 We show that the gamma-NB process can be reduced to the hierarchical Dirichlet process with normalization, highlighting its unique theoretical, structural and computational advantages. [sent-9, score-0.2]

3 A variety of NB processes with distinct sharing mechanisms are constructed and applied to topic modeling, with connections to existing algorithms, showing the importance of inferring both the NB dispersion and probability parameters. [sent-10, score-0.396]

4 1 Introduction There has been increasing interest in count modeling using the Poisson process, geometric process [1, 2, 3, 4] and recently the negative binomial (NB) process [5, 6]. [sent-11, score-0.467]

5 Notably, it has been independently shown in [5] and [6] that the NB process, originally constructed for count analysis, can be naturally applied for mixture modeling of grouped data x1 , · · · , xJ , where each group xj = {xji }i=1,Nj . [sent-12, score-0.381]

6 For a territory long occupied by the hierarchical Dirichlet process (HDP) [7] and related models, the inference of which may require substantial bookkeeping and suffer from slow convergence [7], the discovery of the NB process for mixture modeling can be signiﬁcant. [sent-13, score-0.356]

7 As the seemingly distinct problems of count and mixture modeling are united under the NB process framework, new opportunities emerge for better data ﬁtting, more efﬁcient inference and more ﬂexible model constructions. [sent-14, score-0.396]

8 We perform joint count and mixture modeling under the NB process framework, using completely random measures [1, 8, 9] that are simple to construct and amenable for posterior computation. [sent-18, score-0.326]

9 1 Poisson process for count and mixture modeling Before introducing the NB process, we ﬁrst illustrate how the seemingly distinct problems of count and mixture modeling can be united under the Poisson process. [sent-23, score-0.612]

10 Denote Ω as a measure space and for each Borel set A ⊂ Ω, denote Xj (A) as a count random variable describing the number of observations in xj that reside within A. [sent-24, score-0.249]

11 Given grouped data x1 , · · · , xJ , for any measurable disjoint partition A1 , · · · , AQ of Ω, we aim to jointly model the count random variables {Xj (Aq )}. [sent-25, score-0.179]

12 (2) Thus the Poisson process provides not only a way to generate independent counts from each Aq , but also a mechanism for mixture modeling, which allocates the observations into any measurable disjoint partition {Aq }1,Q of Ω, conditioning on Xj (Ω) and the normalized mean measure G. [sent-30, score-0.257]

13 To complete the model, we may place a gamma process [9] prior on the shared measure as G ∼ GaP(c, G0 ), with concentration parameter c and base measure G0 , such that G(A) ∼ Gamma(G0 (A), 1/c) for each A ⊂ Ω, where G0 can be continuous, discrete or a combination of both. [sent-31, score-0.456]

14 The normalized gamma representation of the DP is discussed in [10, 11, 9] and has been used to construct the group-level DPs for an HDP [12]. [sent-33, score-0.212]

15 The Poisson process has an equal-dispersion assumption for count modeling. [sent-34, score-0.212]

16 As shown in (2), the construction of Poisson processes with a shared gamma process mean measure implies the same mixture proportions across groups, which is essentially the same as the DP when used for mixture modeling when the total counts {Xj (Ω)}j are not treated as random variables. [sent-35, score-0.667]

17 The NB distribution can be augmented into a gamma-Poisson construction as m ∼ Pois(λ), λ ∼ Gamma (r, p/(1 − p)), where the gamma distribution is parameterized by its shape r and scale p/(1 − p). [sent-42, score-0.236]

18 The inference of the NB dispersion parameter r has long been a challenge [13, 18, 19]. [sent-45, score-0.145]

19 In this paper, we ﬁrst place a gamma prior on it as r ∼ Gamma(r1 , 1/c1 ). [sent-46, score-0.212]

20 Since l ∼ Pois(−r ln(1 − p)) by construction, we can use the gamma Poisson conjugacy to update r. [sent-49, score-0.237]

21 2 (below), we can further infer an augmented latent count l for each l, and then use these latent counts to update r1 , assuming r1 ∼ Gamma(r2 , 1/c2 ). [sent-51, score-0.234]

22 2, we can continue this process repeatedly, suggesting that we may build a NB process to model data that have subgroups within groups. [sent-54, score-0.18]

23 The conditional posterior of the latent count l was ﬁrst derived by us but was not given an analytical form [20]. [sent-55, score-0.173]

24 We denote l ∼ CRT(m, r) as a Chinese restaurant table (CRT) count random variable with such a PMF and as proved in the m supplementary material, we can sample it as l = n=1 bn , bn ∼ Bernoulli (r/(n − 1 + r)). [sent-57, score-0.147]

25 3 Gamma-Negative Binomial Process We explore sharing the NB dispersion across groups while the probability parameters are group dependent. [sent-84, score-0.174]

26 We deﬁne a NB process X ∼ NBP(G, p) as X(A) ∼ NB(G(A), p) for each A ⊂ Ω and construct a gamma-NB process for joint count and mixture modeling as Xj ∼ NBP(G, pj ), G ∼ GaP(c, G0 ), which can be augmented as a gamma-gamma-Poisson process as Xj ∼ PP(Λj ), Λj ∼ GaP((1 − pj )/pj , G), G ∼ GaP(c, G0 ). [sent-85, score-1.166]

27 (5) In the above PP(·) and GaP(·) represent the Poisson and gamma processes, respectively, as deﬁned in Section 1. [sent-86, score-0.212]

28 2, the gamma-NB process can also be augmented as Lj Xj ∼ t=1 Log(pj ), Lj ∼ PP(−G ln(1 − pj )), G ∼ GaP(c, G0 ); (6) L= j Lj ∼ L t=1 Log(p ), L ∼ PP(−G0 ln(1 − p )), p = − c− j j ln(1−pj ) ln(1−pj ) . [sent-89, score-0.432]

29 Using the gamma Poisson conjugacy on (5), for each A ⊂ Ω, we have Λj (A)|G, Xj , pj ∼ Gamma (G(A) + Xj (A), pj ), thus the conditional posterior of Λj is Λj |G, Xj , pj ∼ GaP 1/pj , G + Xj . [sent-91, score-1.191]

30 In k=1 K δωk , then L (ωk ) = CRT(L(ωk ), K ) ≥ 1 if either case, let γ0 ∼ Gamma(e0 , 1/f0 ), with the gamma Poisson conjugacy on (6) and (7), we have 1 γ0 |{L (Ω), p } ∼ Gamma e0 + L (Ω), f0 −ln(1−p ) ; (11) G|G0 , {Lj , pj } ∼ GaP c − j ln(1 − pj ), G0 + j Lj . [sent-96, score-0.873]

31 Thus the normalized gamma-NB process leads to an HDP, yet we cannot return from the HDP to the gamma-NB process without modeling Xj (Ω) and Λj (Ω) as random variables. [sent-100, score-0.232]

32 Practically, the gamma-NB process can exploit conjugacy to achieve analytical conditional posteriors for all latent parameters. [sent-102, score-0.187]

33 In the HDP, pj is not explicitly modeled, and since its value becomes irrelevant when taking the normalized constructions in (14), it is usually treated as a nuisance parameter and perceived as pj = 0. [sent-110, score-0.663]

34 2 Augment-and-conquer inference for joint count and mixture modeling For a ﬁnite continuous base measure, the gamma process G ∼ GaP(c, G0 ) can also be deﬁned with its L´ vy measure on a product space R+ × Ω, expressed as ν(drdω) = r−1 e−cr drG0 (dω) [9]. [sent-116, score-0.65]

35 e Since the Poisson intensity ν + = ν(R+ ×Ω) = ∞ and rν(drdω) is ﬁnite, a draw from this R+ ×Ω ∞ process can be expressed as G = k=1 rk δωk , (rk , ωk ) ∼ π(drdω), π(drdω)ν + ≡ ν(drdω) [9]. [sent-117, score-0.277]

36 K Here we consider a discrete base measure as G0 = k=1 γ0 δωk , ωk ∼ g0 (ωk ), then we have G = K K k=1 rk δωk , rk ∼ Gamma(γ0 /K, 1/c), ωk ∼ g0 (ωk ), which becomes a draw from the gamma process with a continuous base measure as K → ∞. [sent-118, score-0.836]

37 Let xji ∼ F (ωzji ) be observation i in group j, Nj linked to a mixture component ωzji ∈ Ω through a distribution F . [sent-119, score-0.204]

38 Using the equivalence between (1) and (2), we can equivalently express Nj and njk in the above model as Nj ∼ Pois (λj ) , [nj1 , · · · , njK ] ∼ K Mult (Nj ; λj1 /λj , · · · , λjK /λj ), where λj = k=1 λjk . [sent-121, score-0.257]

39 Since the data {xji }i=1,Nj are fully exchangeable, rather than drawing [nj1 , · · · , njK ] once, we may equivalently draw the index zji ∼ Discrete (λj1 /λj , · · · , λjK /λj ) (17) 4 N j for each xji and then let njk = i=1 δ(zji = k). [sent-122, score-0.535]

40 This provides further insights on how the seemingly disjoint problems of count and mixture modeling are united under the NB process framework. [sent-123, score-0.377]

41 Note that when K → ∞, we have (lk |−) = δ( j ljk > 0) = δ( j njk > 0). [sent-127, score-0.3]

42 This also implies that by using the Dirichlet process as the foundation, traditional mixture modeling may discard useful count information from the beginning. [sent-129, score-0.326]

43 4 The Negative Binomial Process Family and Related Algorithms The gamma-NB process shares the NB dispersion across groups. [sent-130, score-0.215]

44 Since the NB distribution has two adjustable parameters, we may explore alternative ideas, with the NB probability measure shared across groups as in [6], or with both the dispersion and probability measures shared as in [5]. [sent-131, score-0.242]

45 It is natural to let the probability measure be drawn from a beta process [25, 26], which can be deﬁned by its L´ vy measure on a product space [0, 1] × Ω as ν(dpdω) = cp−1 (1 − p)c−1 dpB0 (dω). [sent-133, score-0.254]

46 e A draw from the beta process B ∼ BP(c, B0 ) with concentration parameter c and base measure B0 ∞ can be expressed as B = k=1 pk δωk . [sent-134, score-0.398]

47 A beta-NB process [5, 6] can be constructed by letting Xj ∼ ∞ NBP(rj , B), with a random draw expressed as Xj = k=1 njk δωk , njk ∼ NB(rj , pk ). [sent-135, score-0.782]

48 Under this construction, the NB probability measure is shared and the NB dispersion parameters are group dependent. [sent-136, score-0.189]

49 As in [5], we may also consider a marked-beta-NB1 process that both the NB probability and dispersion measures are shared, in which each point of the beta process is marked with an independent gamma random variable. [sent-137, score-0.597]

50 Thus a draw from the marked-beta process becomes (R, B) = ∞ ∞ k=1 njk δωk , njk ∼ k=1 (rk , pk )δωk , and the NB process Xj ∼ NBP(R, B) becomes Xj = NB(rk , pk ). [sent-138, score-1.01]

51 This construction can be linked to the model in [27] with appropriate normalization, with advantages that there is no need to ﬁx pj = 0. [sent-141, score-0.318]

52 The zero inﬂated construction can also be linked to models for real valued data using the Indian buffet process (IBP) or beta-Bernoulli process spike-and-slab prior [28, 29, 30, 31]. [sent-143, score-0.2]

53 1 Related Algorithms To show how the NB processes can be diversely constructed and to make connections to previous parametric and nonparametric mixture models, we show in Table 1 a variety of NB processes, which differ on how the dispersion and probability measures are shared. [sent-145, score-0.293]

54 5 Table 1: A variety of negative binomial processes are constructed with distinct sharing mechanisms, reﬂected with which parameters from rk , rj , pk , pj and πk (bjk ) are inferred (indicated by a check-mark ), and the implied VMR and ODL for counts {njk }j,k . [sent-147, score-1.064]

55 They are applied for topic modeling of a document corpus, a typical example of mixture modeling of grouped data. [sent-148, score-0.361]

56 Algorithms rk rj pk pj πk VMR ODL Related Algorithms −1 NB-LDA (1 − pj )−1 rj LDA [32], Dir-PFA [5] −1 NB-HDP 0. [sent-150, score-1.199]

57 5 2 (rk ) bjk FTM [27], SγΓ-PFA [5] −1 Beta-NB (1 − pk )−1 rj BNBP [5], BNBP [6] −1 Gamma-NB (1 − pj )−1 rk CRF-HDP [7, 24] −1 Marked-Beta-NB (1 − pk )−1 rk BNBP [5] settings. [sent-152, score-1.155]

58 We consider topic modeling of a document corpus, a typical example of mixture modeling of grouped data, where each a-bag-of-words document constitutes a group, each word is an exchangeable group member, and F (xji ; ωk ) is simply the probability of word xji in topic ωk . [sent-153, score-0.713]

59 We consider six differently constructed NB processes in Table 1: (i) Related to latent Dirichlet allocation (LDA) [32] and Dirichlet Poisson factor analysis (Dir-PFA) [5], the NB-LDA is also a parametric topic model that requires tuning the number of topics. [sent-154, score-0.213]

60 However, it uses a document dependent rj and pj to automatically learn the smoothing of the gamma distributed topic weights, and it lets rj ∼ Gamma(γ0 , 1/c), γ0 ∼ Gamma(e0 , 1/f0 ) to share statistical strength between documents, with closed-form Gibbs sampling inference. [sent-155, score-0.914]

61 Thus even the most basic parametric LDA topic model can be improved under the NB count modeling framework. [sent-156, score-0.289]

62 (ii) The NB-HDP model is related to the HDP [7], and since pj is an irrelevant parameter in the HDP due to normalization, we set it in the NB-HDP as 0. [sent-157, score-0.318]

63 The NB-HDP model is comparable to the DILN-HDP [12] that constructs the group-level DPs with normalized gamma processes, whose scale parameters are also set as one. [sent-159, score-0.212]

64 (iii) The NB-FTM model introduces an additional beta-Bernoulli process under the NB process framework to explicitly model zero counts. [sent-160, score-0.18]

65 It is the same as the sparse-gamma-gamma-PFA (SγΓ-PFA) in [5] and is comparable to the focused topic model (FTM) [27], which is constructed from the IBP compound DP. [sent-161, score-0.191]

66 The Zero-Inﬂated-NB process improves over them by allowing pj to be inferred, which generally yields better data ﬁtting. [sent-163, score-0.408]

67 (iv) The Gamma-NB process explores the idea that the dispersion measure is shared across groups, and it improves over the NBHDP by allowing the learning of pj . [sent-164, score-0.597]

68 It reduces to the HDP [7] by normalizing both the group-level and the shared gamma processes. [sent-165, score-0.245]

69 (v) The Beta-NB process explores sharing the probability measure across groups, and it improves over the beta negative binomial process (BNBP) proposed in [6], allowing inference of rj . [sent-166, score-0.562]

70 (vi) The Marked-Beta-NB process is comparable to the BNBP proposed in [5], with the distinction that it allows analytical update of rk . [sent-167, score-0.307]

71 , λjk , rk and pk , are also of interest, then the NB process based joint count and mixture models would apparently be more appropriate than the HDP based mixture models. [sent-172, score-0.681]

72 5 Example Results Motivated by Table 1, we consider topic modeling using a variety of NB processes, which differ on which parameters are learned and consequently how the VMR and ODL of the latent counts {njk }j,k are modeled. [sent-173, score-0.234]

73 For fair comparison, they are all implemented with block Gibbs sampling using a discrete base measure with K atoms, and for the ﬁrst ﬁfty iterations, the Gamma-NB process with rk ≡ 50/K and pj ≡ 0. [sent-175, score-0.685]

74 For LDA, we set the topic proportion Dirichlet smoothing parameter as 50/K, following the topic model toolbox2 provided for [35]. [sent-181, score-0.23]

75 be assigned to a topic k based on both F (xji ; ωk ) and the topic weights {λjk }k=1,K ; each topic is drawn from a Dirichlet base measure as ωk ∼ Dir(η, · · · , η) ∈ RV , where V is the number of unique terms in the vocabulary and η is a smoothing parameter. [sent-194, score-0.415]

76 Let vji denote the location of word xji in the vocabulary, then we have (ωk |−) ∼ Dir η + j i δ(zji = k, vji = 1), · · · , η + j i δ(zji = k, vji = V ) . [sent-195, score-0.26]

77 Note that the perplexity per test word is the fair metric to compare topic models. [sent-201, score-0.163]

78 However, when the actual Poisson rates or distribution parameters for counts instead of the mixture proportions are of interest, it is obvious that a NB process based joint count and mixture model would be more appropriate than an HDP based mixture model. [sent-202, score-0.466]

79 Figure 2 shows the learned model parameters by various algorithms under the NB process framework, revealing distinct sharing mechanisms and model properties. [sent-206, score-0.169]

80 When (rj , pk ) is used to model the latent counts {njk }j,k , as in the Beta-NB process, the transition between active and non-active topics is very sharp that pk is either close to one or close to zero. [sent-208, score-0.416]

81 When (rk , pj ) is used, as in the Gamma-NB process, the transition is much smoother that rk gradually decreases. [sent-210, score-0.505]

82 Therefore, we can expect that (rk , pj ) would allow 2 http://psiexp. [sent-212, score-0.318]

83 Note that the transition between active and non-active topics is very sharp when pk is used and much smoother when rk is used. [sent-225, score-0.378]

84 more topics than (rj , pk ), as conﬁrmed in Figure 1 (a) that the Gamma-NB process learns 177 active topics, signiﬁcantly more than the 107 ones of the Beta-NB process. [sent-229, score-0.281]

85 With these analysis, we can conclude that the mean and the amount of overdispersion (measure by the VMR or ODL) for the usage of topic k is positively correlated under (rj , pk ) and negatively correlated under (rk , pj ). [sent-230, score-0.695]

86 When (rk , pk ) is used, as in the Marked-Beta-NB process, more diverse combinations of mean and overdispersion would be allowed as both rk and pk are now responsible for the mean E[ j njk ] = Jrk pk /(1−pk ). [sent-231, score-1.022]

87 For example, there could be not only large mean and small overdispersion (large rk and small pk ), but also large mean and large overdispersion (small rk and large pk ). [sent-232, score-0.898]

88 Thus (rk , pk ) may combine the advantages of using only rk or pk to model topic k, as conﬁrmed by the superior performance of the Marked-Beta-NB over the Beta-NB and Gamma-NB processes. [sent-233, score-0.618]

89 When (rk , πk ) is used, as in the NB-FTM model, our results show that we usually have a small πk and a large rk , indicating topic k is sparsely used across the documents but once it is used, the amount of variation on usage is small. [sent-234, score-0.302]

90 This modeling properties might be helpful when there are excessive number of zeros which might not be well modeled by the NB process alone. [sent-235, score-0.162]

91 In our experiments, we ﬁnd the more direct approaches of using pk or pj generally yield better results, but this might not be the case when excessive number of zeros are better explained with the underlying beta-Bernoulli or IBP processes, e. [sent-236, score-0.496]

92 However, from a count modeling viewpoint, this would make a restrictive assumption that each count vector {njk }k=1,K has the same VMR of 2, and the experimental results in Figure 1 conﬁrm the importance of learning pj together with rk . [sent-242, score-0.801]

93 It is also interesting to examine (15), which can be viewed as the concentration parameter α in the HDP, allowing the adjustment of pj would allow a more ﬂexible model assumption on the amount of variations between the topic proportions, and thus potentially better data ﬁtting. [sent-243, score-0.433]

94 6 Conclusions We propose a variety of negative binomial (NB) processes to jointly model counts across groups, which can be naturally applied for mixture modeling of grouped data. [sent-244, score-0.359]

95 The proposed NB processes are completely random measures that they assign independent random variables to disjoint Borel sets of the measure space, as opposed to the hierarchical Dirichlet process (HDP) whose measures on disjoint Borel sets are negatively correlated. [sent-245, score-0.254]

96 We demonstrate that the gamma-NB process, which shares the NB dispersion measure across groups, can be normalized to produce the HDP and we show in detail its theoretical, structural and computational advantages over the HDP. [sent-247, score-0.156]

97 We examine the distinct sharing mechanisms and model properties of various NB processes, with connections to existing algorithms, with experimental results on topic modeling showing the importance of modeling both the NB dispersion and probability parameters. [sent-248, score-0.423]

98 The IBP compound Dirichlet process and its application to focused topic modeling. [sent-432, score-0.261]

99 Dependent hierarchical beta process for image interpolation and denoising. [sent-458, score-0.19]

100 On the integration of topic modeling and dictionary learning. [sent-465, score-0.167]

similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('nb', 0.62), ('pj', 0.318), ('njk', 0.257), ('gamma', 0.212), ('hdp', 0.187), ('rk', 0.187), ('pk', 0.158), ('xji', 0.142), ('dispersion', 0.125), ('count', 0.122), ('topic', 0.115), ('rj', 0.109), ('poisson', 0.109), ('vmr', 0.107), ('zji', 0.107), ('overdispersion', 0.104), ('pois', 0.104), ('odl', 0.097), ('xj', 0.096), ('aq', 0.094), ('process', 0.09), ('binomial', 0.088), ('ln', 0.087), ('beta', 0.08), ('crt', 0.075), ('jk', 0.067), ('mixture', 0.062), ('dirichlet', 0.058), ('processes', 0.057), ('compound', 0.056), ('lj', 0.054), ('bnbp', 0.054), ('drd', 0.054), ('modeling', 0.052), ('document', 0.051), ('lda', 0.05), ('nbp', 0.047), ('counts', 0.046), ('ljk', 0.043), ('base', 0.039), ('bjk', 0.038), ('ftm', 0.038), ('gibbs', 0.036), ('topics', 0.033), ('shared', 0.033), ('crtp', 0.032), ('vji', 0.032), ('measure', 0.031), ('gap', 0.031), ('pp', 0.03), ('analytical', 0.03), ('index', 0.029), ('grouped', 0.029), ('sharing', 0.029), ('nonparametric', 0.029), ('ibp', 0.029), ('paisley', 0.029), ('dp', 0.028), ('zhou', 0.028), ('dir', 0.028), ('disjoint', 0.028), ('distinct', 0.027), ('corpus', 0.027), ('augmenting', 0.027), ('constructions', 0.027), ('pmf', 0.027), ('perplexity', 0.026), ('borel', 0.025), ('sr', 0.025), ('conjugacy', 0.025), ('nj', 0.025), ('restaurant', 0.025), ('dunson', 0.025), ('wj', 0.025), ('negative', 0.025), ('augmented', 0.024), ('dps', 0.023), ('seemingly', 0.023), ('mechanisms', 0.023), ('occupied', 0.022), ('vy', 0.022), ('word', 0.022), ('proportions', 0.022), ('posteriors', 0.021), ('augmentations', 0.021), ('cwj', 0.021), ('fjv', 0.021), ('overdispersed', 0.021), ('perplexities', 0.021), ('pgf', 0.021), ('latent', 0.021), ('biometrics', 0.021), ('sapiro', 0.021), ('discrete', 0.02), ('hierarchical', 0.02), ('constructed', 0.02), ('groups', 0.02), ('inference', 0.02), ('excessive', 0.02), ('buffet', 0.02)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.99999988 47 nips-2012-Augment-and-Conquer Negative Binomial Processes

Author: Mingyuan Zhou, Lawrence Carin

2 0.25884008 138 nips-2012-Fully Bayesian inference for neural models with negative-binomial spiking

Author: James Scott, Jonathan W. Pillow

Abstract: Characterizing the information carried by neural populations in the brain requires accurate statistical models of neural spike responses. The negative-binomial distribution provides a convenient model for over-dispersed spike counts, that is, responses with greater-than-Poisson variability. Here we describe a powerful data-augmentation framework for fully Bayesian inference in neural models with negative-binomial spiking. Our approach relies on a recently described latentvariable representation of the negative-binomial distribution, which equates it to a Polya-gamma mixture of normals. This framework provides a tractable, conditionally Gaussian representation of the posterior that can be used to design efﬁcient EM and Gibbs sampling based algorithms for inference in regression and dynamic factor models. We apply the model to neural data from primate retina and show that it substantially outperforms Poisson regression on held-out data, and reveals latent structure underlying spike count correlations in simultaneously recorded spike trains. 1

3 0.12373737 316 nips-2012-Small-Variance Asymptotics for Exponential Family Dirichlet Process Mixture Models

Author: Ke Jiang, Brian Kulis, Michael I. Jordan

Abstract: Sampling and variational inference techniques are two standard methods for inference in probabilistic models, but for many problems, neither approach scales effectively to large-scale data. An alternative is to relax the probabilistic model into a non-probabilistic formulation which has a scalable associated algorithm. This can often be fulﬁlled by performing small-variance asymptotics, i.e., letting the variance of particular distributions in the model go to zero. For instance, in the context of clustering, such an approach yields connections between the kmeans and EM algorithms. In this paper, we explore small-variance asymptotics for exponential family Dirichlet process (DP) and hierarchical Dirichlet process (HDP) mixture models. Utilizing connections between exponential family distributions and Bregman divergences, we derive novel clustering algorithms from the asymptotic limit of the DP and HDP mixtures that features the scalability of existing hard clustering methods as well as the ﬂexibility of Bayesian nonparametric models. We focus on special cases of our analysis for discrete-data problems, including topic modeling, and we demonstrate the utility of our results by applying variants of our algorithms to problems arising in vision and document analysis. 1

4 0.11541519 124 nips-2012-Factorial LDA: Sparse Multi-Dimensional Text Models

Author: Michael Paul, Mark Dredze

Abstract: Latent variable models can be enriched with a multi-dimensional structure to consider the many latent factors in a text corpus, such as topic, author perspective and sentiment. We introduce factorial LDA, a multi-dimensional model in which a document is inﬂuenced by K different factors, and each word token depends on a K-dimensional vector of latent variables. Our model incorporates structured word priors and learns a sparse product of factors. Experiments on research abstracts show that our model can learn latent factors such as research topic, scientiﬁc discipline, and focus (methods vs. applications). Our modeling improvements reduce test perplexity and improve human interpretability of the discovered factors. 1

5 0.11398416 354 nips-2012-Truly Nonparametric Online Variational Inference for Hierarchical Dirichlet Processes

Author: Michael Bryant, Erik B. Sudderth

Abstract: Variational methods provide a computationally scalable alternative to Monte Carlo methods for large-scale, Bayesian nonparametric learning. In practice, however, conventional batch and online variational methods quickly become trapped in local optima. In this paper, we consider a nonparametric topic model based on the hierarchical Dirichlet process (HDP), and develop a novel online variational inference algorithm based on split-merge topic updates. We derive a simpler and faster variational approximation of the HDP, and show that by intelligently splitting and merging components of the variational posterior, we can achieve substantially better predictions of test data than conventional online and batch variational algorithms. For streaming analysis of large datasets where batch analysis is infeasible, we show that our split-merge updates better capture the nonparametric properties of the underlying model, allowing continual learning of new topics.

6 0.11113959 89 nips-2012-Coupling Nonparametric Mixtures via Latent Dirichlet Processes

7 0.11008543 355 nips-2012-Truncation-free Online Variational Inference for Bayesian Nonparametric Models

8 0.10010615 166 nips-2012-Joint Modeling of a Matrix with Associated Text via Latent Binary Features

9 0.098525852 60 nips-2012-Bayesian nonparametric models for ranked data

10 0.095984511 19 nips-2012-A Spectral Algorithm for Latent Dirichlet Allocation

11 0.089787215 140 nips-2012-Fusion with Diffusion for Robust Visual Tracking

12 0.086026601 240 nips-2012-Newton-Like Methods for Sparse Inverse Covariance Estimation

13 0.08426322 59 nips-2012-Bayesian nonparametric models for bipartite graphs

14 0.083584219 247 nips-2012-Nonparametric Reduced Rank Regression

15 0.082168549 315 nips-2012-Slice sampling normalized kernel-weighted completely random measure mixture models

16 0.070774212 244 nips-2012-Nonconvex Penalization Using Laplace Exponents and Concave Conjugates

17 0.07016705 77 nips-2012-Complex Inference in Neural Circuits with Probabilistic Population Codes and Topic Models

18 0.068718016 214 nips-2012-Minimizing Sparse High-Order Energies by Submodular Vertex-Cover

19 0.067393981 40 nips-2012-Analyzing 3D Objects in Cluttered Images

20 0.06628862 274 nips-2012-Priors for Diversity in Generative Latent Variable Models

similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.134), (1, 0.05), (2, 0.001), (3, 0.054), (4, -0.189), (5, -0.027), (6, 0.002), (7, -0.01), (8, 0.121), (9, -0.004), (10, 0.121), (11, 0.113), (12, 0.033), (13, -0.076), (14, 0.055), (15, 0.012), (16, 0.06), (17, 0.027), (18, -0.009), (19, 0.02), (20, 0.053), (21, -0.019), (22, 0.027), (23, 0.026), (24, -0.0), (25, -0.025), (26, 0.045), (27, -0.06), (28, -0.086), (29, -0.022), (30, -0.07), (31, 0.055), (32, 0.084), (33, 0.01), (34, 0.099), (35, 0.012), (36, -0.032), (37, 0.079), (38, -0.039), (39, 0.131), (40, -0.22), (41, 0.064), (42, 0.027), (43, 0.062), (44, -0.051), (45, -0.062), (46, -0.035), (47, 0.026), (48, 0.028), (49, 0.079)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.96039093 47 nips-2012-Augment-and-Conquer Negative Binomial Processes

Author: Mingyuan Zhou, Lawrence Carin

2 0.62949175 59 nips-2012-Bayesian nonparametric models for bipartite graphs

Author: Francois Caron

Abstract: We develop a novel Bayesian nonparametric model for random bipartite graphs. The model is based on the theory of completely random measures and is able to handle a potentially inﬁnite number of nodes. We show that the model has appealing properties and in particular it may exhibit a power-law behavior. We derive a posterior characterization, a generative process for network growth, and a simple Gibbs sampler for posterior simulation. Our model is shown to be well ﬁtted to several real-world social networks. 1

3 0.61886352 89 nips-2012-Coupling Nonparametric Mixtures via Latent Dirichlet Processes

Author: Dahua Lin, John W. Fisher

Abstract: Mixture distributions are often used to model complex data. In this paper, we develop a new method that jointly estimates mixture models over multiple data sets by exploiting the statistical dependencies between them. Speciﬁcally, we introduce a set of latent Dirichlet processes as sources of component models (atoms), and for each data set, we construct a nonparametric mixture model by combining sub-sampled versions of the latent DPs. Each mixture model may acquire atoms from different latent DPs, while each atom may be shared by multiple mixtures. This multi-to-multi association distinguishes the proposed method from previous ones that require the model structure to be a tree or a chain, allowing more ﬂexible designs. We also derive a sampling algorithm that jointly infers the model parameters and present experiments on both document analysis and image modeling. 1

4 0.56443012 244 nips-2012-Nonconvex Penalization Using Laplace Exponents and Concave Conjugates

Author: Zhihua Zhang, Bojun Tu

Abstract: In this paper we study sparsity-inducing nonconvex penalty functions using L´ vy e processes. We deﬁne such a penalty as the Laplace exponent of a subordinator. Accordingly, we propose a novel approach for the construction of sparsityinducing nonconvex penalties. Particularly, we show that the nonconvex logarithmic (LOG) and exponential (EXP) penalty functions are the Laplace exponents of Gamma and compound Poisson subordinators, respectively. Additionally, we explore the concave conjugate of nonconvex penalties. We ﬁnd that the LOG and EXP penalties are the concave conjugates of negative Kullback-Leiber (KL) distance functions. Furthermore, the relationship between these two penalties is due to asymmetricity of the KL distance. 1

5 0.55046266 166 nips-2012-Joint Modeling of a Matrix with Associated Text via Latent Binary Features

Author: Xianxing Zhang, Lawrence Carin

Abstract: A new methodology is developed for joint analysis of a matrix and accompanying documents, with the documents associated with the matrix rows/columns. The documents are modeled with a focused topic model, inferring interpretable latent binary features for each document. A new matrix decomposition is developed, with latent binary features associated with the rows/columns, and with imposition of a low-rank constraint. The matrix decomposition and topic model are coupled by sharing the latent binary feature vectors associated with each. The model is applied to roll-call data, with the associated documents deﬁned by the legislation. Advantages of the proposed model are demonstrated for prediction of votes on a new piece of legislation, based only on the observed text of legislation. The coupling of the text and legislation is also shown to yield insight into the properties of the matrix decomposition for roll-call data. 1

6 0.5360418 140 nips-2012-Fusion with Diffusion for Robust Visual Tracking

7 0.5018388 124 nips-2012-Factorial LDA: Sparse Multi-Dimensional Text Models

8 0.48120686 315 nips-2012-Slice sampling normalized kernel-weighted completely random measure mixture models

9 0.47216028 274 nips-2012-Priors for Diversity in Generative Latent Variable Models

10 0.46603075 138 nips-2012-Fully Bayesian inference for neural models with negative-binomial spiking

11 0.45980036 19 nips-2012-A Spectral Algorithm for Latent Dirichlet Allocation

12 0.44848704 12 nips-2012-A Neural Autoregressive Topic Model

13 0.44048804 332 nips-2012-Symmetric Correspondence Topic Models for Multilingual Text Analysis

14 0.43761355 60 nips-2012-Bayesian nonparametric models for ranked data

15 0.4343088 355 nips-2012-Truncation-free Online Variational Inference for Bayesian Nonparametric Models

16 0.43373629 219 nips-2012-Modelling Reciprocating Relationships with Hawkes Processes

17 0.43175411 345 nips-2012-Topic-Partitioned Multinetwork Embeddings

18 0.43146387 220 nips-2012-Monte Carlo Methods for Maximum Margin Supervised Topic Models

19 0.43024009 354 nips-2012-Truly Nonparametric Online Variational Inference for Hierarchical Dirichlet Processes

20 0.42258954 154 nips-2012-How They Vote: Issue-Adjusted Models of Legislative Behavior

similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(0, 0.089), (21, 0.014), (38, 0.1), (39, 0.062), (42, 0.016), (53, 0.017), (54, 0.019), (55, 0.021), (63, 0.031), (73, 0.232), (74, 0.033), (76, 0.13), (80, 0.101), (92, 0.041)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.80739421 47 nips-2012-Augment-and-Conquer Negative Binomial Processes

Author: Mingyuan Zhou, Lawrence Carin

2 0.73879105 103 nips-2012-Distributed Probabilistic Learning for Camera Networks with Missing Data

Author: Sejong Yoon, Vladimir Pavlovic

Abstract: Probabilistic approaches to computer vision typically assume a centralized setting, with the algorithm granted access to all observed data points. However, many problems in wide-area surveillance can beneﬁt from distributed modeling, either because of physical or computational constraints. Most distributed models to date use algebraic approaches (such as distributed SVD) and as a result cannot explicitly deal with missing data. In this work we present an approach to estimation and learning of generative probabilistic models in a distributed context where certain sensor data can be missing. In particular, we show how traditional centralized models, such as probabilistic PCA and missing-data PPCA, can be learned when the data is distributed across a network of sensors. We demonstrate the utility of this approach on the problem of distributed afﬁne structure from motion. Our experiments suggest that the accuracy of the learned probabilistic structure and motion models rivals that of traditional centralized factorization methods while being able to handle challenging situations such as missing or noisy observations. 1

3 0.69526798 83 nips-2012-Controlled Recognition Bounds for Visual Learning and Exploration

Author: Vasiliy Karasev, Alessandro Chiuso, Stefano Soatto

Abstract: We describe the tradeoff between the performance in a visual recognition problem and the control authority that the agent can exercise on the sensing process. We focus on the problem of “visual search” of an object in an otherwise known and static scene, propose a measure of control authority, and relate it to the expected risk and its proxy (conditional entropy of the posterior density). We show this analytically, as well as empirically by simulation using the simplest known model that captures the phenomenology of image formation, including scaling and occlusions. We show that a “passive” agent given a training set can provide no guarantees on performance beyond what is afforded by the priors, and that an “omnipotent” agent, capable of inﬁnite control authority, can achieve arbitrarily good performance (asymptotically). In between these limiting cases, the tradeoff can be characterized empirically. 1

4 0.67895985 166 nips-2012-Joint Modeling of a Matrix with Associated Text via Latent Binary Features

Author: Xianxing Zhang, Lawrence Carin

5 0.66129649 192 nips-2012-Learning the Dependency Structure of Latent Factors

Author: Yunlong He, Yanjun Qi, Koray Kavukcuoglu, Haesun Park

Abstract: In this paper, we study latent factor models with dependency structure in the latent space. We propose a general learning framework which induces sparsity on the undirected graphical model imposed on the vector of latent factors. A novel latent factor model SLFA is then proposed as a matrix factorization problem with a special regularization term that encourages collaborative reconstruction. The main beneﬁt (novelty) of the model is that we can simultaneously learn the lowerdimensional representation for data and model the pairwise relationships between latent factors explicitly. An on-line learning algorithm is devised to make the model feasible for large-scale learning problems. Experimental results on two synthetic data and two real-world data sets demonstrate that pairwise relationships and latent factors learned by our model provide a more structured way of exploring high-dimensional data, and the learned representations achieve the state-of-the-art classiﬁcation performance. 1

6 0.65818036 354 nips-2012-Truly Nonparametric Online Variational Inference for Hierarchical Dirichlet Processes

7 0.65811336 248 nips-2012-Nonparanormal Belief Propagation (NPNBP)

8 0.65717441 89 nips-2012-Coupling Nonparametric Mixtures via Latent Dirichlet Processes

9 0.65550137 172 nips-2012-Latent Graphical Model Selection: Efficient Methods for Locally Tree-like Graphs

10 0.654001 342 nips-2012-The variational hierarchical EM algorithm for clustering hidden Markov models

11 0.65376103 355 nips-2012-Truncation-free Online Variational Inference for Bayesian Nonparametric Models

12 0.65310097 316 nips-2012-Small-Variance Asymptotics for Exponential Family Dirichlet Process Mixture Models

13 0.65276122 104 nips-2012-Dual-Space Analysis of the Sparse Linear Model

14 0.65123135 233 nips-2012-Multiresolution Gaussian Processes

15 0.65052873 270 nips-2012-Phoneme Classification using Constrained Variational Gaussian Process Dynamical System

16 0.64945525 249 nips-2012-Nyström Method vs Random Fourier Features: A Theoretical and Empirical Comparison

17 0.64882076 12 nips-2012-A Neural Autoregressive Topic Model

18 0.64613205 191 nips-2012-Learning the Architecture of Sum-Product Networks Using Clustering on Variables

19 0.64562726 197 nips-2012-Learning with Recursive Perceptual Representations

20 0.64463264 58 nips-2012-Bayesian models for Large-scale Hierarchical Classification