nips nips2011 nips2011-285 knowledge-graph by maker-knowledge-mining

285 nips-2011-The Kernel Beta Process


Source: pdf

Author: Lu Ren, Yingjian Wang, Lawrence Carin, David B. Dunson

Abstract: A new L´ vy process prior is proposed for an uncountable collection of covariatee dependent feature-learning measures; the model is called the kernel beta process (KBP). Available covariates are handled efficiently via the kernel construction, with covariates assumed observed with each data sample (“customer”), and latent covariates learned for each feature (“dish”). Each customer selects dishes from an infinite buffet, in a manner analogous to the beta process, with the added constraint that a customer first decides probabilistically whether to “consider” a dish, based on the distance in covariate space between the customer and dish. If a customer does consider a particular dish, that dish is then selected probabilistically as in the beta process. The beta process is recovered as a limiting case of the KBP. An efficient Gibbs sampler is developed for computations, and state-of-the-art results are presented for image processing and music analysis tasks. 1

Reference: text


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 edu Abstract A new L´ vy process prior is proposed for an uncountable collection of covariatee dependent feature-learning measures; the model is called the kernel beta process (KBP). [sent-9, score-0.76]

2 Available covariates are handled efficiently via the kernel construction, with covariates assumed observed with each data sample (“customer”), and latent covariates learned for each feature (“dish”). [sent-10, score-0.79]

3 Each customer selects dishes from an infinite buffet, in a manner analogous to the beta process, with the added constraint that a customer first decides probabilistically whether to “consider” a dish, based on the distance in covariate space between the customer and dish. [sent-11, score-0.672]

4 If a customer does consider a particular dish, that dish is then selected probabilistically as in the beta process. [sent-12, score-0.455]

5 The beta process is recovered as a limiting case of the KBP. [sent-13, score-0.333]

6 An efficient Gibbs sampler is developed for computations, and state-of-the-art results are presented for image processing and music analysis tasks. [sent-14, score-0.258]

7 A powerful tool for such learning is the Indian buffet process (IBP) [4], in which the data samples serve as “customers”, and the potential features serve as “dishes”. [sent-17, score-0.26]

8 It has recently been demonstrated that the IBP corresponds to a marginalization of a beta-Bernoulli process [15]. [sent-18, score-0.098]

9 The beta process was developed originally by Hjort [5] as a L´ vy process prior for e “hazard measures”, and was recently extended for use in feature learning [15], the interest of this paper; we therefore here refer to it as a “feature-learning measure. [sent-20, score-0.639]

10 ” The beta process is an example of a L´ vy process [6], another example of which is the gamma e process [1]; the normalized gamma process is well known as the Dirichlet process [3, 14]. [sent-21, score-1.045]

11 A key characteristic of such models is that the data samples are assumed exchangeable, meaning that the order/indices of the data may be permuted with no change in the model. [sent-22, score-0.072]

12 1 An important line of research concerns removal of the assumption of exchangeability, allowing incorporation of covariates (e. [sent-24, score-0.245]

13 As an example, MacEachern introduced the dependent Dirichlet process [8]. [sent-27, score-0.135]

14 The form of the tree may be constituted as a result of covariates that are available with the samples, but the tree is not necessarily unique. [sent-29, score-0.274]

15 A dependent IBP (dIBP) model has been introduced recently, with a hierarchical Gaussian process (GP) used to account for covariate dependence [16]; however, the use of a GP may constitute challenges for large-scale problems. [sent-30, score-0.215]

16 Recently a dependent hierarchical beta process (dHBP) has been developed, yielding encouraging results [18]. [sent-31, score-0.37]

17 However, the dHBP has the disadvantage of assigning a kernel to each data sample, and therefore it scales unfavorably as the number of samples increases. [sent-32, score-0.133]

18 In this paper we develop a new L´ vy process prior, termed the kernel beta process (KBP), which e yields an uncountable number of covariate-dependent feature-learning measures, with the beta process a special case. [sent-33, score-1.056]

19 This model may be interpreted as inferring covariates x∗ for each feature (dish), i indexed by i. [sent-34, score-0.245]

20 The generative process by which the nth data sample, with covariates xn , selects features may be viewed as a two-step process. [sent-35, score-0.504]

21 First the nth customer (data sample) decides whether (1) ∗ ∗ to “examine” dish i by drawing zni ∼ Bernoulli(K(xn , x∗ ; ψi )), where ψi are dish-dependent i ∗ kernel parameters that are also inferred (the {ψi } defining the meaning of proximity/locality in co∗ ∗ variate space). [sent-36, score-0.645]

22 The kernels are designed to satisfy K(xn , x∗ ; ψi ) ∈ (0, 1], K(x∗ , x∗ ; ψi ) = 1, i i i (1) ∗ ∗ ∗ and K(xn , xi ; ψi ) → 0 as xn − xi 2 → ∞. [sent-37, score-0.117]

23 In the second step, if zni = 1, customer n draws (2) (2) zni ∼ Bernoulli(πi ), and if zni = 1, the feature associated with dish i is employed by data sample ∗ n. [sent-38, score-1.187]

24 In addition to introducing this new L´ vy process, we examine its properties, and demonstrate how e it may be efficiently applied in important data analysis problems. [sent-41, score-0.208]

25 1 Kernel Beta Process Review of beta and Bernoulli processes A beta process B ∼ BP(c, B0 ) is a distribution on positive random measures over the space (Ω, F). [sent-45, score-0.599]

26 One may marginalize out the measure B analytically, yielding conditional probabilities for the {Zn } that correspond to the Indian buffet process [15, 4]. [sent-51, score-0.266]

27 2 Covariate-dependent L´ vy process e In the above beta-Bernoulli construction, the same measure B ∼ BP(c, B0 ) is employed for generation of all {Zn }, implying that each of the N samples have the same probabilities {πi } for use of the respective features {ωi }. [sent-53, score-0.444]

28 We now assume that with each of the N samples of interest there are an associated set of covariates, denoted respectively as {xn }, with each xn ∈ X . [sent-54, score-0.157]

29 We wish to impose that if samples n and n have similar covariates xn and xn , that it is probable that they will employ a similar subset of the features {ωi }; if the covariates are distinct it is less probable that feature sharing will be manifested. [sent-55, score-0.764]

30 Generalizing (2), consider ∞ B= γi δωi , ωi ∼ B0 (4) i=1 where γi = {γi (x) : x ∈ X } is a stochastic process (random function) from X → [0, 1] (drawn independently from the {ωi }). [sent-56, score-0.098]

31 Hence, B is a dependent collection of L´ vy processes with the measure e ∞ specific to covariate x ∈ X being Bx = i=1 γi (x)δωi . [sent-57, score-0.402]

32 For example, one might consider γi (x) = g{µi (x)}, where g : R → [0, 1] is any monotone differentiable link function and µi (x) : X → R may be modeled as a Gaussian process [10], or related kernel-based construction. [sent-59, score-0.098]

33 To choose g{µi (x)} one can potentially use models for the predictor-dependent breaks in probit, logistic or kernel stick-breaking processes [13, 11, 2]. [sent-60, score-0.086]

34 In the remainder of this paper we propose a special case for design of γi (x), termed the kernel beta process (KBP). [sent-61, score-0.388]

35 3 Characteristic function of the kernel beta process Recall from Hjort [5] that B ∼ BP(c(ω), B0 ) is a beta process on measure space (Ω, F) if its characteristic function satisfies E[ejuB(A) ] = exp{ (ejuπ − 1)ν(dπ, dω)} (5) [0,1]×A √ where here j = −1, and A is any subset in F. [sent-63, score-0.799]

36 The beta process is a particular class of the L´ vy e process, with ν(dπ, dω) defined as in (1). [sent-64, score-0.541]

37 Let x∗ represent random variables drawn from probability measure H, with support on X , and ψ ∗ is also a random variable drawn from an appropriate probability measure Q with support over Ψ (e. [sent-67, score-0.158]

38 , in the context of the radial basis function, ψ ∗ are drawn from a probability measure with support over R+ ). [sent-69, score-0.079]

39 We now define a new L´ vy measure e νX = H(dx∗ )Q(dψ ∗ )ν(dπ, dω) (6) where ν(dπ, dω) is the L´ vy measure associated with the beta process, defined in (1). [sent-70, score-0.743]

40 e ∗ Theorem 1 Assume parameters {x∗ , ψi , πi , ωi } are drawn from measure νX in (6), and that the i following measure is constituted ∞ ∗ πi K(x, x∗ ; ψi )δωi i Bx = (7) i=1 which may be evaluated for any covariate x ∈ X . [sent-71, score-0.234]

41 For any set A ⊂ F, the B evaluated at covariates S, on the set A, 3 yields an |S|-dimensional random vector B(A) = (Bx1 (A), . [sent-79, score-0.245]

42 Expression (7) is a covariate-dependent L´ vy process with L´ vy measure (6), and characteristic function for an arbitrary set of covariates S satisfying (ej − 1)νX (dx∗ , dψ ∗ , dπ, dω)} E[ej ] = exp{ (8) X ×Ψ×[0,1]×A 2 A proof is provided in the Supplemental Material. [sent-83, score-0.837]

43 Additionally, for notational convenience, below a draw of (7), valid for all covariates in X , is denoted B ∼ KBP(c, B0 , H, Q), with c and B0 defining ν(dπ, dω) in (1). [sent-84, score-0.245]

44 4 Relationship to the beta-Bernoulli process If the covariate-dependent measure Bx in (7) is employed to define covariate-dependent feature us∗ age, then Zx ∼ BeP(Bx ), generalizing (3). [sent-86, score-0.196]

45 Hence, given {x∗ , ψi , πi }, the feature-usage measure is i ∞ ∗ ∗ Zx = i=1 bxi δωi , with bxi ∼ Bernoulli(πi K(x, xi ; ψi )). [sent-87, score-0.138]

46 Note that it is equivalent in distribution (1) (2) (1) (2) ∗ to express bxi = zxi zxi , with zxi ∼ Bernoulli(K(x, x∗ ; ψi )) and zxi ∼ Bernoulli(πi ). [sent-88, score-0.426]

47 This i model therefore yields the two-step generalization of the generative process of the beta-Bernoulli (1) process discussed in the Introduction. [sent-89, score-0.196]

48 The condition zxi = 1 only has a high probability when observed covariates x are near the (latent/inferred) covariates x∗ . [sent-90, score-0.585]

49 It is deemed attractive that this i intuitive generative process comes as a result of a rigorous L´ vy process construction, the properties e of which are summarized next. [sent-91, score-0.404]

50 5 Properties of B For all Borel subsets A ∈ F, if B is drawn from the KBP and for covariates x, x ∈ X , we have E[Bx (A)] Cov(Bx (A), Bx (A)) = B0 (A)E(Kx ) = E(Kx Kx ) A B0 (dω)(1 − B0 (dω)) − Cov(Kx , Kx ) c(ω) + 1 2 B0 (dω) A where, E(Kx ) = X ×Ψ K(x, x∗ ; ψ ∗ )H(dx∗ )Q(dψ ∗ ). [sent-93, score-0.278]

51 Consider data yn ∈ RM with associated covariates xn ∈ RL , with n = 1, . [sent-103, score-0.462]

52 The Dirichlet process [3] base 1 measure G0 = N (0, M IM ), and the KBP base measure B0 is a mixture of atoms (factor loadings). [sent-108, score-0.229]

53 For the applications considered it is important that the same atoms be reused at different points {x∗ } i in covariate space, to allow for repeated structure to be manifested as a function of space or time, within the image and music applications, respectively. [sent-109, score-0.377]

54 Note that B is drawn once from the KBP, and when drawing the Zxn we evaluate B as defined by the respective covariate xn . [sent-117, score-0.23]

55 In (10), the ith column of D, denoted Di , is drawn from B0 , with B0 drawn from a Dirichlet process (DP). [sent-121, score-0.164]

56 , Di−1 , α0 , G0 ∼ l=1 (11) where {D∗ }l=1,Nu are the unique dictionary elements shared by the first i − 1 columns of D, and l i−1 n∗ = j=1 δ(Dj = D∗ ). [sent-132, score-0.174]

57 For model inference, an indicator variable ci is introduced for each Di , l l and ci = l with a probability proportional to n∗ , with l = 1, . [sent-133, score-0.13]

58 , Nu , with ci equal to Nu + 1 with l a probability controlled by α0 . [sent-136, score-0.065]

59 3 we consider the following augmented noise model: = λn ◦ mn + ˆn (12) −1 ∼ Bernoulli(˜n ), πn ∼ Beta(a0 , b0 ), ˆn ∼ N (0, α3 IM ) π ˜ n λn ∼ −1 N (0, αλ IM ), mnp with gamma priors placed on αλ and α2 , and with p = 1, . [sent-145, score-0.09]

60 The term λn ◦ mn accounts for “spiky” noise, with potentially large amplitude, and πn represents the probability of spiky noise in ˆ data sample n. [sent-149, score-0.141]

61 Assume T is the truncation level for the number of dictionary elements, {Di }i=1,T ; Nu is the number of unique dictionary elements values in the current Gibbs iteration, {D∗ }l=1,Nu . [sent-154, score-0.316]

62 • Update {D∗ }l=1,L : D∗ ∼ N (µl , Σl ), l l N N −l (bni wni )yn ], µl = Σl [α2 (bni wni )2 + M ]−1 IM , Σl = [α2 n=1 i:ci =l n=1 i:ci =l 5 −l where yn = yn − i:ci =l Di (bni wni ). [sent-160, score-0.5]

63 , K, 2 −i ∗ exp{− α2 DT Di wni − 2wni DT yn }πi K(xn , x∗ ; ψi ) p(bni = 1) i i i 2 . [sent-165, score-0.2]

64 • Update {πi }i=1,T : (1) (2) Introduce two sets of auxiliary variables {zni }i=1,T and {zni }i=1,T for each data yn . [sent-167, score-0.1]

65 (1) (2) ∗ Assume zni ∼ Bernoulli(πi ) and zni ∼ Bernoulli(K(xn , x∗ ; ψi )). [sent-168, score-0.606]

66 zni , 1 + n n Hyperparameter settings For both α1 and α2 the corresponding prior was set to Gamma(10−6 , 10−6 ); the concentration parameter α0 was given a prior Gamma(1, 0. [sent-171, score-0.303]

67 For both experiments below, the number of dictionary elements T was truncated to 256, the number of unique dictionary element values was initialized ∗ to 100, and {πi }i=1,T were initialized to 0. [sent-173, score-0.368]

68 2 Music analysis We consider the same music piece as described in [12]: “A Day in the Life” from the Beatles’ album Sgt. [sent-179, score-0.254]

69 A typical goal of music analysis is to infer interrelationships within the music piece, as a function of time [12]. [sent-183, score-0.434]

70 For the audio data, each MFCC vector yn has an associated time index, the latter used as the covariate xn . [sent-184, score-0.297]

71 Figure 1(b) shows the frequency for the number of unique dictionary elements used by the data, based on the 1600 collected samples; and Figure 1(c) shows the frequency for the number of total dictionary elements used. [sent-187, score-0.428]

72 With the model defined in (10), the sparse vector bn ◦wn indicates the importance of each dictionary element from {Di }i=1,T to data yn . [sent-188, score-0.286]

73 Based on the Gibbs collection samples: (b) frequency on number of unique dictionary elements, and (c) total number of dictionary elements. [sent-190, score-0.324]

74 Finally, this matrix was averaged across the collection samples, to yield a correlation matrix relating one part of the music to all others. [sent-192, score-0.253]

75 We compared KBP performance with results based on BP-FA [17] in which covariates are not employed, and with results from the dynamic clustering model in [12], in which a dynamic HMM is employed (in [12] a dynamic HDP, or dHDP, was used in concert with an HMM). [sent-197, score-0.297]

76 The dHDP-HMM results yield a reasonably good segmentation of the music, but it is unable to infer subtle differences in the music over time (for example, all voices in the music are clustered together, even if they are different). [sent-200, score-0.499]

77 Since the BP-FA does not capture as much localized information in the music (the probability of dictionary usage is the same for all temporal positions), it does not manifest as good a music segmentation as the dHDP-HMM. [sent-201, score-0.633]

78 By contrast, the KBP-FA model yields a good music segmentation, while also capturing subtle differences in the music over time (e. [sent-202, score-0.434]

79 Note that the use of the DP to allow repeated use of dictionary elements as a function of time (covariates) is important here, due to the repetition of structure in the piece. [sent-205, score-0.174]

80 One may listen to the music and observe the segmentation at http://www. [sent-206, score-0.244]

81 3 Image interpolation and denoising We consider image interpolation and denoising as two additional potential applications. [sent-238, score-0.229]

82 In both of these examples each image is divided into N 8 × 8 overlapping patches, and each patch is stacked into a vector of length M = 64, constituting observation yn ∈ RM . [sent-239, score-0.141]

83 The covariate xn represents the 7 patch coordinates in the 2-D space. [sent-240, score-0.197]

84 For image interpolation, we only observe a fraction of the image pixels, sampled uniformly at random. [sent-244, score-0.082]

85 The model infers the underlying dictionary D in the presence of this missing data, as well as the weights on the dictionary elements required for representing the observed components of {yn }; using the inferred dictionary and associated weights, one may readily impute the missing pixel values. [sent-245, score-0.564]

86 In Table 1 we present average PSNR values on the recovered pixel values, as a function of the fraction of pixels that are observed (20% in Table 1 means that 80% of the pixels are missing uniformly at random). [sent-246, score-0.141]

87 Comparisons are made between a model based on BP and one based on the proposed KBP; the latter generally performs better, particularly when a large fraction of the pixels are missing. [sent-247, score-0.057]

88 The proposed algorithm yields results that are comparable to those in [18], which also employed covariates within the BP construc tion. [sent-248, score-0.297]

89 Table 1: Comparison of BP and KBP for interpolating images with pixels missing uniformly at random, using standard image-processing images. [sent-250, score-0.084]

90 60 In the image-denoising example in Figure 3 the images were corrupted with both white Gaussian noise (WGN) and sparse spiky noise, as considered in [18]. [sent-315, score-0.168]

91 The sparse spiky noise exists in particular pixels, selected uniformly at random, with amplitude distributed uniformly between −255 and 255. [sent-316, score-0.171]

92 For the pepper image, 15% of the pixels were corrupted by spiky noise, and the standard deviation of the WGN was 15; for the house image, 10% of the pixels were corrupted by spiky noise and the standard deviation of WGN was 10. [sent-317, score-0.507]

93 2, the BP-FA model augmented with a term for spiky noise (BP-FA+) and the original BP-FA model. [sent-320, score-0.141]

94 Note that here the imposition of covariates and the KBP yields marked improvements in this application, relative to BP-FA alone. [sent-323, score-0.245]

95 95 dB for House), with the dictionary elements shown in column two and the reconstruction in column three; the fourth and fifth columns show results from BP-FA+ (PSNR is 23. [sent-328, score-0.174]

96 8 5 Summary A new L´ vy process, the kernel beta process, has been developed for the problem of nonparametric e Bayesian feature learning, with example results presented for music analysis, image denoising, and image interpolation. [sent-334, score-0.85]

97 Infinite latent feature models and the Indian buffet process. [sent-359, score-0.122]

98 Nonparametric Bayes estimators based on beta processes in models for life history data. [sent-364, score-0.296]

99 The phylogenetic Indian buffet process: A non-exchangeable nonparametric prior for latent features. [sent-387, score-0.175]

100 Dependent hierarchical beta process for image interpolation and denoising. [sent-449, score-0.418]


similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('kbp', 0.379), ('bni', 0.322), ('zni', 0.303), ('covariates', 0.245), ('beta', 0.235), ('kx', 0.226), ('music', 0.217), ('vy', 0.208), ('bx', 0.169), ('dictionary', 0.142), ('buffet', 0.122), ('xn', 0.117), ('dish', 0.107), ('spiky', 0.107), ('wni', 0.1), ('bp', 0.1), ('yn', 0.1), ('process', 0.098), ('zxi', 0.095), ('zxn', 0.095), ('db', 0.088), ('bernoulli', 0.086), ('customer', 0.084), ('psnr', 0.083), ('di', 0.083), ('covariate', 0.08), ('im', 0.079), ('nu', 0.076), ('peppers', 0.076), ('ibp', 0.072), ('indian', 0.07), ('ci', 0.065), ('durham', 0.061), ('house', 0.058), ('pixels', 0.057), ('bep', 0.057), ('wgn', 0.057), ('gamma', 0.056), ('kernel', 0.055), ('dunson', 0.055), ('nonparametric', 0.053), ('employed', 0.052), ('duke', 0.052), ('gibbs', 0.051), ('dishes', 0.05), ('denoising', 0.05), ('zn', 0.048), ('bxi', 0.046), ('measure', 0.046), ('nth', 0.044), ('interpolation', 0.044), ('bn', 0.044), ('ren', 0.042), ('nc', 0.041), ('hmm', 0.041), ('wn', 0.041), ('image', 0.041), ('samples', 0.04), ('frequency', 0.04), ('atoms', 0.039), ('dirichlet', 0.038), ('bnk', 0.038), ('dhbp', 0.038), ('hjort', 0.038), ('unfavorably', 0.038), ('vni', 0.038), ('voices', 0.038), ('piece', 0.037), ('dependent', 0.037), ('correlation', 0.036), ('draws', 0.035), ('dp', 0.034), ('day', 0.034), ('noise', 0.034), ('drawn', 0.033), ('pepper', 0.033), ('elements', 0.032), ('characteristic', 0.032), ('processes', 0.031), ('truncate', 0.031), ('mfccs', 0.031), ('index', 0.03), ('life', 0.03), ('amplitude', 0.03), ('usage', 0.03), ('construction', 0.029), ('constituted', 0.029), ('uncountable', 0.029), ('probabilistically', 0.029), ('zx', 0.029), ('cov', 0.028), ('segmentation', 0.027), ('corrupted', 0.027), ('missing', 0.027), ('ni', 0.027), ('initialized', 0.026), ('inferred', 0.026), ('decides', 0.026), ('carin', 0.026), ('infers', 0.026)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.99999988 285 nips-2011-The Kernel Beta Process

Author: Lu Ren, Yingjian Wang, Lawrence Carin, David B. Dunson

Abstract: A new L´ vy process prior is proposed for an uncountable collection of covariatee dependent feature-learning measures; the model is called the kernel beta process (KBP). Available covariates are handled efficiently via the kernel construction, with covariates assumed observed with each data sample (“customer”), and latent covariates learned for each feature (“dish”). Each customer selects dishes from an infinite buffet, in a manner analogous to the beta process, with the added constraint that a customer first decides probabilistically whether to “consider” a dish, based on the distance in covariate space between the customer and dish. If a customer does consider a particular dish, that dish is then selected probabilistically as in the beta process. The beta process is recovered as a limiting case of the KBP. An efficient Gibbs sampler is developed for computations, and state-of-the-art results are presented for image processing and music analysis tasks. 1

2 0.11104591 134 nips-2011-Infinite Latent SVM for Classification and Multi-task Learning

Author: Jun Zhu, Ning Chen, Eric P. Xing

Abstract: Unlike existing nonparametric Bayesian models, which rely solely on specially conceived priors to incorporate domain knowledge for discovering improved latent representations, we study nonparametric Bayesian inference with regularization on the desired posterior distributions. While priors can indirectly affect posterior distributions through Bayes’ theorem, imposing posterior regularization is arguably more direct and in some cases can be much easier. We particularly focus on developing infinite latent support vector machines (iLSVM) and multi-task infinite latent support vector machines (MT-iLSVM), which explore the largemargin idea in combination with a nonparametric Bayesian model for discovering predictive latent features for classification and multi-task learning, respectively. We present efficient inference methods and report empirical studies on several benchmark datasets. Our results appear to demonstrate the merits inherited from both large-margin learning and Bayesian nonparametrics.

3 0.11040796 70 nips-2011-Dimensionality Reduction Using the Sparse Linear Model

Author: Ioannis A. Gkioulekas, Todd Zickler

Abstract: We propose an approach for linear unsupervised dimensionality reduction, based on the sparse linear model that has been used to probabilistically interpret sparse coding. We formulate an optimization problem for learning a linear projection from the original signal domain to a lower-dimensional one in a way that approximately preserves, in expectation, pairwise inner products in the sparse domain. We derive solutions to the problem, present nonlinear extensions, and discuss relations to compressed sensing. Our experiments using facial images, texture patches, and images of object categories suggest that the approach can improve our ability to recover meaningful structure in many classes of signals. 1

4 0.10453863 200 nips-2011-On the Analysis of Multi-Channel Neural Spike Data

Author: Bo Chen, David E. Carlson, Lawrence Carin

Abstract: Nonparametric Bayesian methods are developed for analysis of multi-channel spike-train data, with the feature learning and spike sorting performed jointly. The feature learning and sorting are performed simultaneously across all channels. Dictionary learning is implemented via the beta-Bernoulli process, with spike sorting performed via the dynamic hierarchical Dirichlet process (dHDP), with these two models coupled. The dHDP is augmented to eliminate refractoryperiod violations, it allows the “appearance” and “disappearance” of neurons over time, and it models smooth variation in the spike statistics. 1

5 0.09867724 132 nips-2011-Inferring Interaction Networks using the IBP applied to microRNA Target Prediction

Author: Hai-son P. Le, Ziv Bar-joseph

Abstract: Determining interactions between entities and the overall organization and clustering of nodes in networks is a major challenge when analyzing biological and social network data. Here we extend the Indian Buffet Process (IBP), a nonparametric Bayesian model, to integrate noisy interaction scores with properties of individual entities for inferring interaction networks and clustering nodes within these networks. We present an application of this method to study how microRNAs regulate mRNAs in cells. Analysis of synthetic and real data indicates that the method improves upon prior methods, correctly recovers interactions and clusters, and provides accurate biological predictions. 1

6 0.095833562 104 nips-2011-Generalized Beta Mixtures of Gaussians

7 0.091638774 259 nips-2011-Sparse Estimation with Structured Dictionaries

8 0.091097705 301 nips-2011-Variational Gaussian Process Dynamical Systems

9 0.083935656 203 nips-2011-On the accuracy of l1-filtering of signals with block-sparse structure

10 0.083552137 139 nips-2011-Kernel Bayes' Rule

11 0.083060928 115 nips-2011-Hierarchical Topic Modeling for Analysis of Time-Evolving Personal Choices

12 0.081610024 118 nips-2011-High-dimensional regression with noisy and missing data: Provable guarantees with non-convexity

13 0.073532999 221 nips-2011-Priors over Recurrent Continuous Time Processes

14 0.071961969 60 nips-2011-Confidence Sets for Network Structure

15 0.07004828 266 nips-2011-Spatial distance dependent Chinese restaurant processes for image segmentation

16 0.068893954 258 nips-2011-Sparse Bayesian Multi-Task Learning

17 0.057262566 26 nips-2011-Additive Gaussian Processes

18 0.05480513 173 nips-2011-Modelling Genetic Variations using Fragmentation-Coagulation Processes

19 0.05432643 191 nips-2011-Nonnegative dictionary learning in the exponential noise model for adaptive music signal representation

20 0.054319013 113 nips-2011-Hierarchical Matching Pursuit for Image Classification: Architecture and Fast Algorithms


similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.161), (1, 0.059), (2, 0.008), (3, -0.04), (4, -0.042), (5, -0.072), (6, 0.07), (7, 0.006), (8, 0.077), (9, 0.098), (10, -0.038), (11, 0.013), (12, 0.053), (13, -0.027), (14, -0.075), (15, -0.041), (16, 0.008), (17, -0.078), (18, -0.035), (19, -0.017), (20, 0.154), (21, -0.027), (22, -0.079), (23, -0.071), (24, -0.056), (25, -0.059), (26, 0.004), (27, 0.059), (28, -0.136), (29, 0.172), (30, 0.071), (31, 0.012), (32, -0.09), (33, 0.064), (34, 0.027), (35, 0.002), (36, -0.02), (37, 0.0), (38, 0.046), (39, 0.03), (40, 0.026), (41, -0.023), (42, 0.067), (43, -0.039), (44, -0.037), (45, 0.022), (46, -0.139), (47, -0.004), (48, -0.049), (49, -0.03)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.92407578 285 nips-2011-The Kernel Beta Process

Author: Lu Ren, Yingjian Wang, Lawrence Carin, David B. Dunson

Abstract: A new L´ vy process prior is proposed for an uncountable collection of covariatee dependent feature-learning measures; the model is called the kernel beta process (KBP). Available covariates are handled efficiently via the kernel construction, with covariates assumed observed with each data sample (“customer”), and latent covariates learned for each feature (“dish”). Each customer selects dishes from an infinite buffet, in a manner analogous to the beta process, with the added constraint that a customer first decides probabilistically whether to “consider” a dish, based on the distance in covariate space between the customer and dish. If a customer does consider a particular dish, that dish is then selected probabilistically as in the beta process. The beta process is recovered as a limiting case of the KBP. An efficient Gibbs sampler is developed for computations, and state-of-the-art results are presented for image processing and music analysis tasks. 1

2 0.70715326 132 nips-2011-Inferring Interaction Networks using the IBP applied to microRNA Target Prediction

Author: Hai-son P. Le, Ziv Bar-joseph

Abstract: Determining interactions between entities and the overall organization and clustering of nodes in networks is a major challenge when analyzing biological and social network data. Here we extend the Indian Buffet Process (IBP), a nonparametric Bayesian model, to integrate noisy interaction scores with properties of individual entities for inferring interaction networks and clustering nodes within these networks. We present an application of this method to study how microRNAs regulate mRNAs in cells. Analysis of synthetic and real data indicates that the method improves upon prior methods, correctly recovers interactions and clusters, and provides accurate biological predictions. 1

3 0.56346369 200 nips-2011-On the Analysis of Multi-Channel Neural Spike Data

Author: Bo Chen, David E. Carlson, Lawrence Carin

Abstract: Nonparametric Bayesian methods are developed for analysis of multi-channel spike-train data, with the feature learning and spike sorting performed jointly. The feature learning and sorting are performed simultaneously across all channels. Dictionary learning is implemented via the beta-Bernoulli process, with spike sorting performed via the dynamic hierarchical Dirichlet process (dHDP), with these two models coupled. The dHDP is augmented to eliminate refractoryperiod violations, it allows the “appearance” and “disappearance” of neurons over time, and it models smooth variation in the spike statistics. 1

4 0.56069297 134 nips-2011-Infinite Latent SVM for Classification and Multi-task Learning

Author: Jun Zhu, Ning Chen, Eric P. Xing

Abstract: Unlike existing nonparametric Bayesian models, which rely solely on specially conceived priors to incorporate domain knowledge for discovering improved latent representations, we study nonparametric Bayesian inference with regularization on the desired posterior distributions. While priors can indirectly affect posterior distributions through Bayes’ theorem, imposing posterior regularization is arguably more direct and in some cases can be much easier. We particularly focus on developing infinite latent support vector machines (iLSVM) and multi-task infinite latent support vector machines (MT-iLSVM), which explore the largemargin idea in combination with a nonparametric Bayesian model for discovering predictive latent features for classification and multi-task learning, respectively. We present efficient inference methods and report empirical studies on several benchmark datasets. Our results appear to demonstrate the merits inherited from both large-margin learning and Bayesian nonparametrics.

5 0.53926259 269 nips-2011-Spike and Slab Variational Inference for Multi-Task and Multiple Kernel Learning

Author: Miguel Lázaro-gredilla, Michalis K. Titsias

Abstract: We introduce a variational Bayesian inference algorithm which can be widely applied to sparse linear models. The algorithm is based on the spike and slab prior which, from a Bayesian perspective, is the golden standard for sparse inference. We apply the method to a general multi-task and multiple kernel learning model in which a common set of Gaussian process functions is linearly combined with task-specific sparse weights, thus inducing relation between tasks. This model unifies several sparse linear models, such as generalized linear models, sparse factor analysis and matrix factorization with missing values, so that the variational algorithm can be applied to all these cases. We demonstrate our approach in multioutput Gaussian process regression, multi-class classification, image processing applications and collaborative filtering. 1

6 0.51711875 104 nips-2011-Generalized Beta Mixtures of Gaussians

7 0.49936819 259 nips-2011-Sparse Estimation with Structured Dictionaries

8 0.48715213 70 nips-2011-Dimensionality Reduction Using the Sparse Linear Model

9 0.47810322 266 nips-2011-Spatial distance dependent Chinese restaurant processes for image segmentation

10 0.47064447 173 nips-2011-Modelling Genetic Variations using Fragmentation-Coagulation Processes

11 0.46402892 221 nips-2011-Priors over Recurrent Continuous Time Processes

12 0.45248353 191 nips-2011-Nonnegative dictionary learning in the exponential noise model for adaptive music signal representation

13 0.44508192 13 nips-2011-A blind sparse deconvolution method for neural spike identification

14 0.43165219 139 nips-2011-Kernel Bayes' Rule

15 0.41134104 60 nips-2011-Confidence Sets for Network Structure

16 0.40817547 258 nips-2011-Sparse Bayesian Multi-Task Learning

17 0.39685893 42 nips-2011-Bayesian Bias Mitigation for Crowdsourcing

18 0.3952001 301 nips-2011-Variational Gaussian Process Dynamical Systems

19 0.39238882 127 nips-2011-Image Parsing with Stochastic Scene Grammar

20 0.37976542 225 nips-2011-Probabilistic amplitude and frequency demodulation


similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(0, 0.024), (4, 0.041), (20, 0.027), (26, 0.028), (31, 0.088), (33, 0.033), (43, 0.075), (45, 0.08), (57, 0.021), (65, 0.015), (74, 0.1), (80, 0.289), (83, 0.028), (84, 0.043), (99, 0.037)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.72905999 285 nips-2011-The Kernel Beta Process

Author: Lu Ren, Yingjian Wang, Lawrence Carin, David B. Dunson

Abstract: A new L´ vy process prior is proposed for an uncountable collection of covariatee dependent feature-learning measures; the model is called the kernel beta process (KBP). Available covariates are handled efficiently via the kernel construction, with covariates assumed observed with each data sample (“customer”), and latent covariates learned for each feature (“dish”). Each customer selects dishes from an infinite buffet, in a manner analogous to the beta process, with the added constraint that a customer first decides probabilistically whether to “consider” a dish, based on the distance in covariate space between the customer and dish. If a customer does consider a particular dish, that dish is then selected probabilistically as in the beta process. The beta process is recovered as a limiting case of the KBP. An efficient Gibbs sampler is developed for computations, and state-of-the-art results are presented for image processing and music analysis tasks. 1

2 0.71180868 42 nips-2011-Bayesian Bias Mitigation for Crowdsourcing

Author: Fabian L. Wauthier, Michael I. Jordan

Abstract: Biased labelers are a systemic problem in crowdsourcing, and a comprehensive toolbox for handling their responses is still being developed. A typical crowdsourcing application can be divided into three steps: data collection, data curation, and learning. At present these steps are often treated separately. We present Bayesian Bias Mitigation for Crowdsourcing (BBMC), a Bayesian model to unify all three. Most data curation methods account for the effects of labeler bias by modeling all labels as coming from a single latent truth. Our model captures the sources of bias by describing labelers as influenced by shared random effects. This approach can account for more complex bias patterns that arise in ambiguous or hard labeling tasks and allows us to merge data curation and learning into a single computation. Active learning integrates data collection with learning, but is commonly considered infeasible with Gibbs sampling inference. We propose a general approximation strategy for Markov chains to efficiently quantify the effect of a perturbation on the stationary distribution and specialize this approach to active learning. Experiments show BBMC to outperform many common heuristics. 1

3 0.5268209 68 nips-2011-Demixed Principal Component Analysis

Author: Wieland Brendel, Ranulfo Romo, Christian K. Machens

Abstract: In many experiments, the data points collected live in high-dimensional observation spaces, yet can be assigned a set of labels or parameters. In electrophysiological recordings, for instance, the responses of populations of neurons generally depend on mixtures of experimentally controlled parameters. The heterogeneity and diversity of these parameter dependencies can make visualization and interpretation of such data extremely difficult. Standard dimensionality reduction techniques such as principal component analysis (PCA) can provide a succinct and complete description of the data, but the description is constructed independent of the relevant task variables and is often hard to interpret. Here, we start with the assumption that a particularly informative description is one that reveals the dependency of the high-dimensional data on the individual parameters. We show how to modify the loss function of PCA so that the principal components seek to capture both the maximum amount of variance about the data, while also depending on a minimum number of parameters. We call this method demixed principal component analysis (dPCA) as the principal components here segregate the parameter dependencies. We phrase the problem as a probabilistic graphical model, and present a fast Expectation-Maximization (EM) algorithm. We demonstrate the use of this algorithm for electrophysiological data and show that it serves to demix the parameter-dependence of a neural population response. 1

4 0.51968294 112 nips-2011-Heavy-tailed Distances for Gradient Based Image Descriptors

Author: Yangqing Jia, Trevor Darrell

Abstract: Many applications in computer vision measure the similarity between images or image patches based on some statistics such as oriented gradients. These are often modeled implicitly or explicitly with a Gaussian noise assumption, leading to the use of the Euclidean distance when comparing image descriptors. In this paper, we show that the statistics of gradient based image descriptors often follow a heavy-tailed distribution, which undermines any principled motivation for the use of Euclidean distances. We advocate for the use of a distance measure based on the likelihood ratio test with appropriate probabilistic models that fit the empirical data distribution. We instantiate this similarity measure with the Gammacompound-Laplace distribution, and show significant improvement over existing distance measures in the application of SIFT feature matching, at relatively low computational cost. 1

5 0.51762682 258 nips-2011-Sparse Bayesian Multi-Task Learning

Author: Shengbo Guo, Onno Zoeter, Cédric Archambeau

Abstract: We propose a new sparse Bayesian model for multi-task regression and classification. The model is able to capture correlations between tasks, or more specifically a low-rank approximation of the covariance matrix, while being sparse in the features. We introduce a general family of group sparsity inducing priors based on matrix-variate Gaussian scale mixtures. We show the amount of sparsity can be learnt from the data by combining an approximate inference approach with type II maximum likelihood estimation of the hyperparameters. Empirical evaluations on data sets from biology and vision demonstrate the applicability of the model, where on both regression and classification tasks it achieves competitive predictive performance compared to previously proposed methods. 1

6 0.51760578 200 nips-2011-On the Analysis of Multi-Channel Neural Spike Data

7 0.51758116 276 nips-2011-Structured sparse coding via lateral inhibition

8 0.51587766 57 nips-2011-Comparative Analysis of Viterbi Training and Maximum Likelihood Estimation for HMMs

9 0.51456892 273 nips-2011-Structural equations and divisive normalization for energy-dependent component analysis

10 0.50954235 235 nips-2011-Recovering Intrinsic Images with a Global Sparsity Prior on Reflectance

11 0.50943834 266 nips-2011-Spatial distance dependent Chinese restaurant processes for image segmentation

12 0.5092693 281 nips-2011-The Doubly Correlated Nonparametric Topic Model

13 0.50871491 186 nips-2011-Noise Thresholds for Spectral Clustering

14 0.50571531 183 nips-2011-Neural Reconstruction with Approximate Message Passing (NeuRAMP)

15 0.5037353 156 nips-2011-Learning to Learn with Compound HD Models

16 0.50359482 144 nips-2011-Learning Auto-regressive Models from Sequence and Non-sequence Data

17 0.50253421 66 nips-2011-Crowdclustering

18 0.50243336 158 nips-2011-Learning unbelievable probabilities

19 0.502428 43 nips-2011-Bayesian Partitioning of Large-Scale Distance Data

20 0.50156236 155 nips-2011-Learning to Agglomerate Superpixel Hierarchies