nips nips2008 nips2008-234 knowledge-graph by maker-knowledge-mining

234 nips-2008-The Infinite Factorial Hidden Markov Model


Source: pdf

Author: Jurgen V. Gael, Yee W. Teh, Zoubin Ghahramani

Abstract: We introduce a new probability distribution over a potentially infinite number of binary Markov chains which we call the Markov Indian buffet process. This process extends the IBP to allow temporal dependencies in the hidden variables. We use this stochastic process to build a nonparametric extension of the factorial hidden Markov model. After constructing an inference scheme which combines slice sampling and dynamic programming we demonstrate how the infinite factorial hidden Markov model can be used for blind source separation. 1

Reference: text


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 uk Abstract We introduce a new probability distribution over a potentially infinite number of binary Markov chains which we call the Markov Indian buffet process. [sent-9, score-0.336]

2 This process extends the IBP to allow temporal dependencies in the hidden variables. [sent-10, score-0.132]

3 We use this stochastic process to build a nonparametric extension of the factorial hidden Markov model. [sent-11, score-0.421]

4 After constructing an inference scheme which combines slice sampling and dynamic programming we demonstrate how the infinite factorial hidden Markov model can be used for blind source separation. [sent-12, score-0.593]

5 1 Introduction When modeling discrete time series data, the hidden Markov model [1] (HMM) is one of the most widely used and successful tools. [sent-13, score-0.166]

6 The HMM defines a probability distribution over observations y1 , y2 , · · · yT using the following generative model: it assumes there is a hidden Markov chain s1 , s2 , · · · , sT with st ∈ {1 · · · K} whose dynamics is governed by a K by K stochastic transition matrix π. [sent-14, score-0.437]

7 At each timestep t, the Markov chain generates an output yt using some likelihood model F parametrized by a state dependent parameter θst . [sent-15, score-0.306]

8 One shortcoming of the hidden Markov model is the limited representational power of the latent variables. [sent-18, score-0.239]

9 One way to look at the distribution defined by the HMM is to write down the marginal distribution of yt given the previous latent state st−1 p(yt |st−1 ) = p(st |st−1 )p(yt |st ) = st πst−1 ,st F (yt ; θst ). [sent-19, score-0.374]

10 (2) st Equation (2) illustrates that the observations are generated from a dynamic mixture model. [sent-20, score-0.207]

11 The factorial hidden Markov model (FHMM), developed in [2], addresses the limited representational power of the hidden Markov model. [sent-21, score-0.505]

12 The FHMM extends the HMM by representing the hidden state ∗ http://mlg. [sent-22, score-0.164]

13 that for all our models, all latent chains start in (m) a dummy state that is in the 0 state. [sent-30, score-0.232]

14 The parallel chains can be viewed as latent features which evolve over time according to Markov dynamics. [sent-36, score-0.257]

15 Unfortunately, the dimensionality M of our factorial representation or equivalently, the number of parallel Markov chains, is a new free parameter for the FHMM which we would prefer learning from data rather than specifying it beforehand. [sent-40, score-0.238]

16 In this work, we derive the basic building block for nonparametric Bayesian factor models for time series which we call the Markov Indian Buffet Process (mIBP). [sent-44, score-0.115]

17 Using this distribution we build a nonparametric extension of the FHMM which we call the Infinite Factorial Hidden Markov Model (iFHMM). [sent-45, score-0.115]

18 This construction allows us to learn a factorial representation for time series. [sent-46, score-0.251]

19 Section 4 shows results of our application of the iFHMM to a blind source separation problem. [sent-50, score-0.131]

20 In this representation rows correspond to timesteps and the columns to features or Markov chains. [sent-53, score-0.159]

21 We want the distribution over matrices to satisfy the following two properties: (1) the potential number of columns (representing latent features) should be able to be arbitrary large; (2) the rows (representing timesteps) should evolve according to a Markov process. [sent-54, score-0.246]

22 1 A finite model Let S represent a binary matrix with T rows (datapoints) and M columns (features). [sent-59, score-0.206]

23 stm represents the hidden state at time t for Markov chain m. [sent-60, score-0.299]

24 Each Markov chain evolves according to the transition matrix 1 am am W (m) = (3) 1 b m bm 2 (m) where Wij = p(st+1,m = j|stm = i). [sent-61, score-0.254]

25 We give the parameters of W (m) distributions am ∼ Beta(α/M, 1) and bm ∼ Beta(γ, δ). [sent-62, score-0.11]

26 Each chain starts with a dummy zero state s0m = 0. [sent-63, score-0.138]

27 The hidden state sequence for chain m is generated by sampling T steps from a Markov chain with transition matrix W (m) . [sent-64, score-0.37]

28 Summarizing, the generative specification for this process is α ,1 , bm ∼ Beta(γ, δ), M s0m = 0 , stm ∼ Bernoulli(a1−st−1,m bst−1,m ). [sent-65, score-0.183]

29 m m ∀m ∈ {1, 2, · · · , M } : am ∼ Beta (4) Next, we evaluate the probability of the state matrix S with the transition matrix parameters W (m) marginalized out. [sent-66, score-0.158]

30 We introduce the following notation, let c00 , c01 , c10 , c11 be the number of 0 → m m m m 0, 0 → 1, 1 → 0 and 1 → 1 transitions respectively, in binary chain m (including the transition from the dummy state to the first state). [sent-67, score-0.208]

31 We can then write M p(S|a, b) = 00 c01 10 c11 m m (1 − am )cm am (1 − bm )cm bm . [sent-68, score-0.22]

32 (5) m=1 We integrate out a and b with respect to the conjugate priors defined in equation (4) and find M p(S|α, γ, δ) = α α 11 10 01 00 M Γ( M + cm )Γ(cm + 1)Γ(γ + δ)Γ(δ + cm )Γ(γ + cm ) , α Γ( M + c00 + c01 + 1)Γ(γ)Γ(δ)Γ(γ + δ + c10 + c11 ) m m m m m=1 (6) where Γ(x) is the Gamma function. [sent-69, score-0.521]

33 In other words, our factorial model is exchangeable in the columns as we don’t care about the ordering of the features. [sent-74, score-0.366]

34 The left-ordered form of a binary S matrix can be defined as follows: we interpret one column of length T as encoding a binary number: column m encodes the number 2T −1 s1m + 2T −2 s2m + · · · + sT m . [sent-76, score-0.178]

35 Then, we denote with Mh the number of columns in the matrix S that have the same history. [sent-78, score-0.11]

36 We say a matrix is a lofmatrix if its columns are sorted in decreasing history values. [sent-79, score-0.11]

37 p(S|α, γ, δ) (7) S∈[S] = M α α 01 00 10 11 M Γ( M + cm )Γ(cm + 1)Γ(γ + δ)Γ(δ + cm )Γ(γ + cm ) . [sent-84, score-0.48]

38 α 00 + c01 + 1)Γ(γ)Γ(δ)Γ(γ + δ + c10 + c11 ) 2T −1 Γ( M + cm m m m h=0 Mh ! [sent-85, score-0.16]

39 3 Properties of the distribution First of all, it is interesting to note from equation (9) that our model is exchangeable in the columns and Markov exchangeable2 in the rows. [sent-97, score-0.159]

40 In this stochastic process, T customers enter an Indian restaurant with an infinitely long buffet of dishes organized in a line. [sent-99, score-0.297]

41 The first customer enters the restaurant and takes a serving from each dish, starting at the left of the buffet and stopping after a Poisson(α) number of dishes as his plate becomes overburdened. [sent-100, score-0.459]

42 A waiter stands near the buffet and takes notes as to how many people have eaten which dishes. [sent-101, score-0.359]

43 The t’th customer enters the restaurant and starts at the left of the buffet. [sent-102, score-0.204]

44 At dish m, he looks at the customer in front of him to see whether he has served himself that dish. [sent-103, score-0.393]

45 The customer then serves himself dish m with probability m (c11 + δ)/(γ + δ + c10 + c11 ). [sent-105, score-0.276]

46 The customer then serves himself dish m m with probability c00 /(c00 + c01 ). [sent-107, score-0.276]

47 m m m The customer then moves on to the next dish and does exactly the same. [sent-108, score-0.276]

48 After the customer has passed all dishes people have previously served themselves from, he tries Poisson(α/t) new dishes. [sent-109, score-0.317]

49 (t) If we denote with M1 the number of new dishes tried by the t’th customer, the probability of any particular matrix being produced by this process is p([S]) = M αM+ T t=1 (t) M1 ! [sent-110, score-0.111]

50 exp{−αHT } α α 01 00 10 11 M Γ( M + cm )Γ(cm + 1)Γ(γ + δ)Γ(δ + cm )Γ(γ + cm ) . [sent-111, score-0.48]

51 4 A stick breaking representation Although the representation above is convenient for theoretical analysis, it is not very practical for inference. [sent-119, score-0.127]

52 Interestingly, we can adapt the stick breaking construction for the IBP [8] to the mIBP. [sent-120, score-0.171]

53 The first step in the stick breaking construction is to find the distribution of a(1) > a(2) > · · · , the order statistics of the parameters a. [sent-122, score-0.171]

54 (m−1) (m) (12) The variables bm are all independent draws from a Beta(γ, δ) distribution which is independent of M . [sent-124, score-0.11]

55 Now that we have a model with a more expressive latent structure, we want to add a likelihood model F which describes the distribution over the observations conditional on the latent structure. [sent-133, score-0.243]

56 Formally, F (yt st ) describes the probability of generating yt given the model parameters and the current latent feature state st . [sent-134, score-0.569]

57 We note that there are two important conditions which the likelihood must satisfy in order for the limit M to be valid: (1) the likelihood must be invariant to permutations of the features, (2) the likelihood cannot depend on m if stm = 0. [sent-135, score-0.225]

58 Figure 3 shows the graphical model for our construction which we call the Infinite Factorial Hidden Markov Model (iFHMM). [sent-136, score-0.111]

59 As we explain in detail in section 4, we are interested in ICA to solve the blind source separation problem. [sent-141, score-0.131]

60 Assume that M signals are represented through the vectors xm ; grouping them we can represent the signals using the matrix X = [x1 x2 xM ]. [sent-142, score-0.168]

61 Next, we linearly combine the signals using a mixing matrix W to generate the observed signal Y = XW . [sent-143, score-0.181]

62 [10] used the IBP to design the Infinite Independent Component Analysis (iICA) model which learns an appropriate number of signals from exchangeable data. [sent-147, score-0.155]

63 The ICA iFHMM generative model can be described as follows: we sample S mIBP and pointwise multiply (denoted by ) it with a signal matrix X. [sent-149, score-0.155]

64 One could choose many other distributions for X, but since in section 4 we will model speech data, which is known to be heavy tailed, the Laplace distribution is a convenient choice. [sent-151, score-0.135]

65 Speakers will be speaking infrequently so pointwise multiplying a heavy tailed distribution with a sparse binary matrix achieves our goal of producing a sparse heavy tailed distribution. [sent-152, score-0.312]

66 Next, we introduce a mixing matrix W which has a row for each signal in S X and a column 2 for each observed dimension in Y . [sent-153, score-0.154]

67 Finally, we combine the signal and mixing matrices as in the finite case to form the 5 2 observation matrix Y : Y = (S X)W + where is Normal(0, σY ) IID noise for each element. [sent-155, score-0.159]

68 In terms of the general iFHMM model defined in the previous section, the base distribution H is a joint distribution over columns of X and rows of W . [sent-156, score-0.13]

69 The likelihood F performs the pointwise multiplication, mixes the signals and adds the noise. [sent-157, score-0.129]

70 2 Inference Inference for nonparametric models requires special treatment as the potentially unbounded dimensionality of the model makes it hard to use exact inference schemes. [sent-160, score-0.149]

71 Traditionally, in nonparametric factor models inference is done using Gibbs sampling, sometimes augmented with Metropolis Hastings steps to improve performance. [sent-161, score-0.115]

72 In the context of the infinite hidden Markov model, a solution was recently proposed in [12], where a slice sampler adaptively truncates the infinite dimensional model after which a dynamic programming performs exact inference. [sent-163, score-0.308]

73 Since a stick breaking construction for the iFHMM is readily available, we can use a very similar approach for the iFHMM. [sent-164, score-0.171]

74 This allows us to analytically integrate out one column of X in the dynamic program and resample it from the posterior afterwards. [sent-182, score-0.166]

75 A third algorithm runs dynamic programming on multiple chains at once. [sent-184, score-0.129]

76 6 (a) Ground Truth (b) ICA iFHMM (c) iICA (d) ICA iFHMM (e) iICA Figure 4: Blind speech separation experiment; figures represent which speaker is speaking at a certain point in time: columns are speakers, rows are white if the speaker is talking and black otherwise. [sent-190, score-0.381]

77 4 Experiments To test our model and inference algorithms, we address a blind speech separation task, also known as the cocktail party problem. [sent-192, score-0.262]

78 Given the mixed speech signals, the goal is to separate out the individual speech signals. [sent-194, score-0.128]

79 Key to our presentation is that we want to illustrate that using nonparametric methods, we can learn the number of speakers from a small amount of data. [sent-195, score-0.195]

80 Our first experiment learns to recover the signals in a setting with more microphones then speakers, our second experiment uses less microphones then speakers. [sent-196, score-0.136]

81 The experimental setup was the following: we downloaded data from 5 speakers from the Speech Separation Challenge website3 . [sent-197, score-0.113]

82 Each mixture is a linear combination of each of the 5 speakers using Uniform(0, 1) mixing weights. [sent-201, score-0.149]

83 Visual inspection of the recovered S matrix also shows that the model discovers who is speaking at what time. [sent-221, score-0.144]

84 Although the model discovers some structure in the data, it fails to find the right number of speakers (it finds 9) and does a poor job in discovering which speaker is active at which time. [sent-223, score-0.227]

85 We computed the average mutual information between the 5 columns of the true S matrix and the first 5 columns of the recovered S matrices. [sent-224, score-0.214]

86 In a second experiment, we chose to perform blind speech separation using only the first 3 microphones. [sent-230, score-0.195]

87 In this setting both methods fail to identify the number of speakers although the ICA iFHMM clearly performs better. [sent-239, score-0.113]

88 The ICA iFHMM finds one too many signal: the spurious signal is very similar to the third signal which suggests that the error is a problem of the inference algorithm and not so much of the model itself. [sent-240, score-0.145]

89 [2] introduced a factorial hidden Markov model which explicitly models dynamic latent features while in [13] a nonparametric version of the the Hidden Markov Model was presented. [sent-246, score-0.607]

90 We showed how this stochastic process can be used to build a nonparametric extension of the FHMM which we call the iFHMM. [sent-249, score-0.115]

91 Although the derivation of the mIBP with this extra parameter is straightforward, we as yet lack a stick breaking construction for this model which is crucial for our inference scheme. [sent-253, score-0.238]

92 Rabiner, “A tutorial on hidden markov models and selected applications in speech recognition,” Proceedings of the IEEE, vol. [sent-259, score-0.384]

93 Ji, “Multi-view face tracking with factorial and switching hmm,” in Proceedings of the Seventh IEEE Workshops on Application of Computer Vision, pp. [sent-270, score-0.207]

94 Duh, “Joint labeling of multiple sequences: A factorial hmm approach,” in 43rd Annual Meeting of the Association of Computational Linguistics (ACL) - Student Research Workshop, 2005. [sent-276, score-0.299]

95 Ghahramani, “Infinite latent feature models and the indian buffet process,” Advances in Neural Information Processing Systems, vol. [sent-280, score-0.409]

96 Ghahramani, “Stick-breaking construction for the indian buffet process,” ou Proceedings of the International Conference on Artificial Intelligence and Statistics, vol. [sent-290, score-0.38]

97 Scott, “Bayesian methods for hidden markov models: Recursive computing in the 21st century,” Journal of the American Statistical Association, vol. [sent-304, score-0.32]

98 Ghahramani, “Beam sampling for the infinite hidden markov model,” in The 25th International Conference on Machine Learning, vol. [sent-313, score-0.32]

99 Rasmussen, “The infinite hidden markov model,” Advances in Neural Information Processing Systems, vol. [sent-320, score-0.32]

100 Sollich, “Bayesian nonparametric latent feature models,” Bayesian Statistics, vol. [sent-327, score-0.155]


similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('ifhmm', 0.502), ('factorial', 0.207), ('iica', 0.201), ('ica', 0.189), ('fhmm', 0.188), ('mibp', 0.188), ('markov', 0.188), ('buffet', 0.188), ('ibp', 0.184), ('st', 0.161), ('cm', 0.16), ('dish', 0.151), ('indian', 0.148), ('hidden', 0.132), ('customer', 0.125), ('speakers', 0.113), ('bm', 0.11), ('yt', 0.108), ('waiter', 0.105), ('hmm', 0.092), ('beta', 0.092), ('replies', 0.084), ('chains', 0.083), ('blind', 0.083), ('nonparametric', 0.082), ('stm', 0.073), ('latent', 0.073), ('ghahramani', 0.07), ('stick', 0.068), ('dishes', 0.067), ('people', 0.066), ('columns', 0.066), ('speech', 0.064), ('mh', 0.063), ('gael', 0.063), ('signals', 0.062), ('chain', 0.062), ('nite', 0.061), ('served', 0.059), ('exchangeable', 0.059), ('breaking', 0.059), ('front', 0.058), ('iid', 0.058), ('slice', 0.058), ('speaker', 0.05), ('person', 0.049), ('separation', 0.048), ('dynamic', 0.046), ('ht', 0.045), ('resample', 0.044), ('tailed', 0.044), ('dummy', 0.044), ('construction', 0.044), ('matrix', 0.044), ('restaurant', 0.042), ('asks', 0.042), ('minm', 0.042), ('waiters', 0.042), ('integrate', 0.041), ('timestep', 0.041), ('matrices', 0.04), ('signal', 0.039), ('zoubin', 0.039), ('sampler', 0.038), ('transition', 0.038), ('pointwise', 0.038), ('mutual', 0.038), ('gamma', 0.038), ('enters', 0.037), ('microphones', 0.037), ('unmix', 0.037), ('jurgen', 0.037), ('talking', 0.037), ('knowles', 0.037), ('evolve', 0.037), ('heavy', 0.037), ('speaking', 0.036), ('mixing', 0.036), ('column', 0.035), ('gibbs', 0.035), ('model', 0.034), ('permutations', 0.034), ('features', 0.033), ('inference', 0.033), ('call', 0.033), ('binary', 0.032), ('state', 0.032), ('microphone', 0.031), ('didn', 0.031), ('teh', 0.031), ('limit', 0.031), ('ground', 0.031), ('parallel', 0.031), ('poisson', 0.03), ('rows', 0.03), ('timesteps', 0.03), ('discovers', 0.03), ('xw', 0.03), ('gures', 0.029), ('likelihood', 0.029)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.99999988 234 nips-2008-The Infinite Factorial Hidden Markov Model

Author: Jurgen V. Gael, Yee W. Teh, Zoubin Ghahramani

Abstract: We introduce a new probability distribution over a potentially infinite number of binary Markov chains which we call the Markov Indian buffet process. This process extends the IBP to allow temporal dependencies in the hidden variables. We use this stochastic process to build a nonparametric extension of the factorial hidden Markov model. After constructing an inference scheme which combines slice sampling and dynamic programming we demonstrate how the infinite factorial hidden Markov model can be used for blind source separation. 1

2 0.20394021 235 nips-2008-The Infinite Hierarchical Factor Regression Model

Author: Piyush Rai, Hal Daume

Abstract: We propose a nonparametric Bayesian factor regression model that accounts for uncertainty in the number of factors, and the relationship between factors. To accomplish this, we propose a sparse variant of the Indian Buffet Process and couple this with a hierarchical model over factors, based on Kingman’s coalescent. We apply this model to two problems (factor analysis and factor regression) in gene-expression data analysis. 1

3 0.13215248 192 nips-2008-Reducing statistical dependencies in natural signals using radial Gaussianization

Author: Siwei Lyu, Eero P. Simoncelli

Abstract: We consider the problem of transforming a signal to a representation in which the components are statistically independent. When the signal is generated as a linear transformation of independent Gaussian or non-Gaussian sources, the solution may be computed using a linear transformation (PCA or ICA, respectively). Here, we consider a complementary case, in which the source is non-Gaussian but elliptically symmetric. Such a source cannot be decomposed into independent components using a linear transform, but we show that a simple nonlinear transformation, which we call radial Gaussianization (RG), is able to remove all dependencies. We apply this methodology to natural signals, demonstrating that the joint distributions of nearby bandpass filter responses, for both sounds and images, are closer to being elliptically symmetric than linearly transformed factorial sources. Consistent with this, we demonstrate that the reduction in dependency achieved by applying RG to either pairs or blocks of bandpass filter responses is significantly greater than that achieved by PCA or ICA.

4 0.13120715 26 nips-2008-Analyzing human feature learning as nonparametric Bayesian inference

Author: Thomas L. Griffiths, Joseph L. Austerweil

Abstract: Almost all successful machine learning algorithms and cognitive models require powerful representations capturing the features that are relevant to a particular problem. We draw on recent work in nonparametric Bayesian statistics to define a rational model of human feature learning that forms a featural representation from raw sensory data without pre-specifying the number of features. By comparing how the human perceptual system and our rational model use distributional and category information to infer feature representations, we seek to identify some of the forces that govern the process by which people separate and combine sensory primitives to form features. 1

5 0.12449401 232 nips-2008-The Conjoint Effect of Divisive Normalization and Orientation Selectivity on Redundancy Reduction

Author: Fabian H. Sinz, Matthias Bethge

Abstract: Bandpass filtering, orientation selectivity, and contrast gain control are prominent features of sensory coding at the level of V1 simple cells. While the effect of bandpass filtering and orientation selectivity can be assessed within a linear model, contrast gain control is an inherently nonlinear computation. Here we employ the class of Lp elliptically contoured distributions to investigate the extent to which the two features—orientation selectivity and contrast gain control—are suited to model the statistics of natural images. Within this framework we find that contrast gain control can play a significant role for the removal of redundancies in natural images. Orientation selectivity, in contrast, has only a very limited potential for redundancy reduction. 1

6 0.11625 77 nips-2008-Evaluating probabilities under high-dimensional latent variable models

7 0.11615456 102 nips-2008-ICA based on a Smooth Estimation of the Differential Entropy

8 0.11004444 112 nips-2008-Kernel Measures of Independence for non-iid Data

9 0.076128125 233 nips-2008-The Gaussian Process Density Sampler

10 0.07147003 229 nips-2008-Syntactic Topic Models

11 0.065964065 81 nips-2008-Extracting State Transition Dynamics from Multiple Spike Trains with Correlated Poisson HMM

12 0.06289617 1 nips-2008-A Convergent $O(n)$ Temporal-difference Algorithm for Off-policy Learning with Linear Function Approximation

13 0.058030985 242 nips-2008-Translated Learning: Transfer Learning across Different Feature Spaces

14 0.057251107 176 nips-2008-Partially Observed Maximum Entropy Discrimination Markov Networks

15 0.056112587 237 nips-2008-The Recurrent Temporal Restricted Boltzmann Machine

16 0.055928584 123 nips-2008-Linear Classification and Selective Sampling Under Low Noise Conditions

17 0.05551026 103 nips-2008-Implicit Mixtures of Restricted Boltzmann Machines

18 0.055224463 154 nips-2008-Nonparametric Bayesian Learning of Switching Linear Dynamical Systems

19 0.05489818 177 nips-2008-Particle Filter-based Policy Gradient in POMDPs

20 0.051559672 208 nips-2008-Shared Segmentation of Natural Scenes Using Dependent Pitman-Yor Processes


similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, -0.175), (1, 0.02), (2, 0.081), (3, -0.024), (4, 0.045), (5, 0.004), (6, 0.004), (7, 0.181), (8, 0.047), (9, -0.044), (10, -0.064), (11, -0.041), (12, 0.128), (13, -0.004), (14, -0.136), (15, -0.076), (16, 0.074), (17, 0.009), (18, -0.002), (19, 0.118), (20, -0.132), (21, 0.096), (22, -0.022), (23, 0.136), (24, -0.137), (25, -0.062), (26, -0.049), (27, 0.034), (28, -0.021), (29, 0.08), (30, 0.084), (31, 0.19), (32, 0.018), (33, 0.011), (34, -0.13), (35, -0.039), (36, 0.012), (37, -0.087), (38, 0.011), (39, -0.046), (40, 0.082), (41, 0.026), (42, -0.139), (43, 0.0), (44, 0.008), (45, 0.149), (46, -0.076), (47, 0.054), (48, -0.006), (49, 0.064)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.9317565 234 nips-2008-The Infinite Factorial Hidden Markov Model

Author: Jurgen V. Gael, Yee W. Teh, Zoubin Ghahramani

Abstract: We introduce a new probability distribution over a potentially infinite number of binary Markov chains which we call the Markov Indian buffet process. This process extends the IBP to allow temporal dependencies in the hidden variables. We use this stochastic process to build a nonparametric extension of the factorial hidden Markov model. After constructing an inference scheme which combines slice sampling and dynamic programming we demonstrate how the infinite factorial hidden Markov model can be used for blind source separation. 1

2 0.62246823 235 nips-2008-The Infinite Hierarchical Factor Regression Model

Author: Piyush Rai, Hal Daume

Abstract: We propose a nonparametric Bayesian factor regression model that accounts for uncertainty in the number of factors, and the relationship between factors. To accomplish this, we propose a sparse variant of the Indian Buffet Process and couple this with a hierarchical model over factors, based on Kingman’s coalescent. We apply this model to two problems (factor analysis and factor regression) in gene-expression data analysis. 1

3 0.54578674 192 nips-2008-Reducing statistical dependencies in natural signals using radial Gaussianization

Author: Siwei Lyu, Eero P. Simoncelli

Abstract: We consider the problem of transforming a signal to a representation in which the components are statistically independent. When the signal is generated as a linear transformation of independent Gaussian or non-Gaussian sources, the solution may be computed using a linear transformation (PCA or ICA, respectively). Here, we consider a complementary case, in which the source is non-Gaussian but elliptically symmetric. Such a source cannot be decomposed into independent components using a linear transform, but we show that a simple nonlinear transformation, which we call radial Gaussianization (RG), is able to remove all dependencies. We apply this methodology to natural signals, demonstrating that the joint distributions of nearby bandpass filter responses, for both sounds and images, are closer to being elliptically symmetric than linearly transformed factorial sources. Consistent with this, we demonstrate that the reduction in dependency achieved by applying RG to either pairs or blocks of bandpass filter responses is significantly greater than that achieved by PCA or ICA.

4 0.52830189 232 nips-2008-The Conjoint Effect of Divisive Normalization and Orientation Selectivity on Redundancy Reduction

Author: Fabian H. Sinz, Matthias Bethge

Abstract: Bandpass filtering, orientation selectivity, and contrast gain control are prominent features of sensory coding at the level of V1 simple cells. While the effect of bandpass filtering and orientation selectivity can be assessed within a linear model, contrast gain control is an inherently nonlinear computation. Here we employ the class of Lp elliptically contoured distributions to investigate the extent to which the two features—orientation selectivity and contrast gain control—are suited to model the statistics of natural images. Within this framework we find that contrast gain control can play a significant role for the removal of redundancies in natural images. Orientation selectivity, in contrast, has only a very limited potential for redundancy reduction. 1

5 0.49032104 26 nips-2008-Analyzing human feature learning as nonparametric Bayesian inference

Author: Thomas L. Griffiths, Joseph L. Austerweil

Abstract: Almost all successful machine learning algorithms and cognitive models require powerful representations capturing the features that are relevant to a particular problem. We draw on recent work in nonparametric Bayesian statistics to define a rational model of human feature learning that forms a featural representation from raw sensory data without pre-specifying the number of features. By comparing how the human perceptual system and our rational model use distributional and category information to infer feature representations, we seek to identify some of the forces that govern the process by which people separate and combine sensory primitives to form features. 1

6 0.47861069 77 nips-2008-Evaluating probabilities under high-dimensional latent variable models

7 0.46996978 233 nips-2008-The Gaussian Process Density Sampler

8 0.4624337 236 nips-2008-The Mondrian Process

9 0.41800052 221 nips-2008-Stochastic Relational Models for Large-scale Dyadic Data using MCMC

10 0.41699293 18 nips-2008-An Efficient Sequential Monte Carlo Algorithm for Coalescent Clustering

11 0.40512681 134 nips-2008-Mixed Membership Stochastic Blockmodels

12 0.39452797 102 nips-2008-ICA based on a Smooth Estimation of the Differential Entropy

13 0.37713239 237 nips-2008-The Recurrent Temporal Restricted Boltzmann Machine

14 0.35778388 154 nips-2008-Nonparametric Bayesian Learning of Switching Linear Dynamical Systems

15 0.35461777 11 nips-2008-A spatially varying two-sample recombinant coalescent, with applications to HIV escape response

16 0.33727041 31 nips-2008-Bayesian Exponential Family PCA

17 0.32415906 100 nips-2008-How memory biases affect information transmission: A rational analysis of serial reproduction

18 0.32377455 216 nips-2008-Sparse probabilistic projections

19 0.32293215 112 nips-2008-Kernel Measures of Independence for non-iid Data

20 0.32060942 211 nips-2008-Simple Local Models for Complex Dynamical Systems


similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(6, 0.044), (7, 0.059), (12, 0.032), (15, 0.012), (28, 0.174), (57, 0.147), (63, 0.019), (71, 0.011), (77, 0.067), (78, 0.011), (83, 0.052), (98, 0.293)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.79642612 234 nips-2008-The Infinite Factorial Hidden Markov Model

Author: Jurgen V. Gael, Yee W. Teh, Zoubin Ghahramani

Abstract: We introduce a new probability distribution over a potentially infinite number of binary Markov chains which we call the Markov Indian buffet process. This process extends the IBP to allow temporal dependencies in the hidden variables. We use this stochastic process to build a nonparametric extension of the factorial hidden Markov model. After constructing an inference scheme which combines slice sampling and dynamic programming we demonstrate how the infinite factorial hidden Markov model can be used for blind source separation. 1

2 0.7400648 75 nips-2008-Estimating vector fields using sparse basis field expansions

Author: Stefan Haufe, Vadim V. Nikulin, Andreas Ziehe, Klaus-Robert Müller, Guido Nolte

Abstract: We introduce a novel framework for estimating vector fields using sparse basis field expansions (S-FLEX). The notion of basis fields, which are an extension of scalar basis functions, arises naturally in our framework from a rotational invariance requirement. We consider a regression setting as well as inverse problems. All variants discussed lead to second-order cone programming formulations. While our framework is generally applicable to any type of vector field, we focus in this paper on applying it to solving the EEG/MEG inverse problem. It is shown that significantly more precise and neurophysiologically more plausible location and shape estimates of cerebral current sources from EEG/MEG measurements become possible with our method when comparing to the state-of-the-art. 1

3 0.71351397 87 nips-2008-Fitted Q-iteration by Advantage Weighted Regression

Author: Gerhard Neumann, Jan R. Peters

Abstract: Recently, fitted Q-iteration (FQI) based methods have become more popular due to their increased sample efficiency, a more stable learning process and the higher quality of the resulting policy. However, these methods remain hard to use for continuous action spaces which frequently occur in real-world tasks, e.g., in robotics and other technical applications. The greedy action selection commonly used for the policy improvement step is particularly problematic as it is expensive for continuous actions, can cause an unstable learning process, introduces an optimization bias and results in highly non-smooth policies unsuitable for real-world systems. In this paper, we show that by using a soft-greedy action selection the policy improvement step used in FQI can be simplified to an inexpensive advantageweighted regression. With this result, we are able to derive a new, computationally efficient FQI algorithm which can even deal with high dimensional action spaces. 1

4 0.64574146 100 nips-2008-How memory biases affect information transmission: A rational analysis of serial reproduction

Author: Jing Xu, Thomas L. Griffiths

Abstract: Many human interactions involve pieces of information being passed from one person to another, raising the question of how this process of information transmission is affected by the capacities of the agents involved. In the 1930s, Sir Frederic Bartlett explored the influence of memory biases in “serial reproduction” of information, in which one person’s reconstruction of a stimulus from memory becomes the stimulus seen by the next person. These experiments were done using relatively uncontrolled stimuli such as pictures and stories, but suggested that serial reproduction would transform information in a way that reflected the biases inherent in memory. We formally analyze serial reproduction using a Bayesian model of reconstruction from memory, giving a general result characterizing the effect of memory biases on information transmission. We then test the predictions of this account in two experiments using simple one-dimensional stimuli. Our results provide theoretical and empirical justification for the idea that serial reproduction reflects memory biases. 1

5 0.64154768 235 nips-2008-The Infinite Hierarchical Factor Regression Model

Author: Piyush Rai, Hal Daume

Abstract: We propose a nonparametric Bayesian factor regression model that accounts for uncertainty in the number of factors, and the relationship between factors. To accomplish this, we propose a sparse variant of the Indian Buffet Process and couple this with a hierarchical model over factors, based on Kingman’s coalescent. We apply this model to two problems (factor analysis and factor regression) in gene-expression data analysis. 1

6 0.63407362 208 nips-2008-Shared Segmentation of Natural Scenes Using Dependent Pitman-Yor Processes

7 0.6313746 200 nips-2008-Robust Kernel Principal Component Analysis

8 0.63092291 27 nips-2008-Artificial Olfactory Brain for Mixture Identification

9 0.62648809 116 nips-2008-Learning Hybrid Models for Image Annotation with Partially Labeled Data

10 0.62560368 80 nips-2008-Extended Grassmann Kernels for Subspace-Based Learning

11 0.62235385 236 nips-2008-The Mondrian Process

12 0.61958718 118 nips-2008-Learning Transformational Invariants from Natural Movies

13 0.61909848 197 nips-2008-Relative Performance Guarantees for Approximate Inference in Latent Dirichlet Allocation

14 0.61771321 158 nips-2008-Offline Handwriting Recognition with Multidimensional Recurrent Neural Networks

15 0.61702758 138 nips-2008-Modeling human function learning with Gaussian processes

16 0.61659664 216 nips-2008-Sparse probabilistic projections

17 0.6138106 192 nips-2008-Reducing statistical dependencies in natural signals using radial Gaussianization

18 0.61285865 66 nips-2008-Dynamic visual attention: searching for coding length increments

19 0.61258048 246 nips-2008-Unsupervised Learning of Visual Sense Models for Polysemous Words

20 0.61093283 176 nips-2008-Partially Observed Maximum Entropy Discrimination Markov Networks