nips nips2001 nips2001-183 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Matthew J. Beal, Zoubin Ghahramani, Carl E. Rasmussen
Abstract: We show that it is possible to extend hidden Markov models to have a countably infinite number of hidden states. By using the theory of Dirichlet processes we can implicitly integrate out the infinitely many transition parameters, leaving only three hyperparameters which can be learned from data. These three hyperparameters define a hierarchical Dirichlet process capable of capturing a rich set of transition dynamics. The three hyperparameters control the time scale of the dynamics, the sparsity of the underlying state-transition matrix, and the expected number of distinct hidden states in a finite sequence. In this framework it is also natural to allow the alphabet of emitted symbols to be infinite— consider, for example, symbols being possible words appearing in English text.
Reference: text
sentIndex sentText sentNum sentScore
1 uk ¡ Abstract We show that it is possible to extend hidden Markov models to have a countably infinite number of hidden states. [sent-10, score-0.537]
2 By using the theory of Dirichlet processes we can implicitly integrate out the infinitely many transition parameters, leaving only three hyperparameters which can be learned from data. [sent-11, score-0.529]
3 These three hyperparameters define a hierarchical Dirichlet process capable of capturing a rich set of transition dynamics. [sent-12, score-0.492]
4 The three hyperparameters control the time scale of the dynamics, the sparsity of the underlying state-transition matrix, and the expected number of distinct hidden states in a finite sequence. [sent-13, score-0.664]
5 In this framework it is also natural to allow the alphabet of emitted symbols to be infinite— consider, for example, symbols being possible words appearing in English text. [sent-14, score-0.167]
6 An HMM defines a probability distribution over sequences of observations (symbols) by invoking another sequence of unobserved, or hidden, discrete state variables . [sent-16, score-0.299]
7 The basic idea in an HMM is that the seis independent of quence of hidden states has Markov dynamics—i. [sent-17, score-0.42]
8 The model is defined in terms of two sets of parameters, the transition matrix whose element is and the emission matrix whose element is . [sent-20, score-0.621]
9 It has been proposed to approximate such Bayesian integration both using variational methods [3] and by conditioning on a single most likely hidden state sequence [8]. [sent-28, score-0.511]
10 In this paper we start from the point of view that the basic modelling assumption of HMMs—that the data was generated by some discrete state variable which can take on one of several values—is unreasonable for most real-world problems. [sent-29, score-0.207]
11 Instead we formulate the idea of HMMs with a countably infinite number of hidden states. [sent-30, score-0.29]
12 In principle, such models have infinitely many parameters in the state transition matrix. [sent-31, score-0.4]
13 Obviously it would not be sensible to optimise these parameters; instead we use the theory of Dirichlet processes (DPs) [2, 1] to implicitly integrate them out, leaving just three hyperparameters defining the prior over transition dynamics. [sent-32, score-0.565]
14 1 Because of this we have extended the notion of a DP to a two-stage hierarchical process which couples transitions between different states. [sent-35, score-0.204]
15 It should be stressed that Dirichlet distributions have been used extensively both as priors for mixing proportions and to smooth n-gram models over finite alphabets [4], which differs considerably from the model presented here. [sent-36, score-0.142]
16 We explore properties of the HDP prior, showing that it can generate interesting hidden state sequences and that it can also be used as an emission model for an infinite alphabet of symbols. [sent-39, score-0.895]
17 This infinite emission model is controlled by two additional hyperparameters. [sent-40, score-0.389]
18 In section 4 we describe the procedures for inference (Gibbs sampling the hidden states), learning (optimising the hyperparameters), and likelihood evaluation (infinite-state particle filtering). [sent-41, score-0.421]
19 2 Properties of the Dirichlet Process @ £ $ Let us examine in detail the statistics of hidden state transitions from a particular state to , with the number of hidden states finite and equal to . [sent-43, score-1.13]
20 The transition probabilities row of the transition matrix can be interpreted as mixing proportions for given in the that we call . [sent-44, score-0.672]
21 © ¡ ¡©¦¦0©© §¦ % £ ED @C ©§ ¦ £ §¥ £ ¤ ¨ ¥ ¨ © samples from a discrete indicator variable which can take with proportions given by . [sent-46, score-0.139]
22 Let us see what happens to the distribution of these indicators when we integrate out the mixing proportions under a conjugate prior. [sent-49, score-0.25]
23 The conditional probability of an indicator given the setting of all other indicators (denoted ) is given by (4) where is the counts as in (1) with the indicator removed. [sent-53, score-0.286]
24 A key property of DPs, which is at the very heart of the model in this paper, is the expression for (4) when we take the limit as the number of hidden states tends to infinity: i. [sent-55, score-0.42]
25 T Q I§ U ¦ R G £ P) © T ¨ SA £ ¦ ¨ HF Q Q T for all unrepresented , combined ) ¦ 0 ©T b ' where is the number of represented states (i. [sent-58, score-0.295]
26 ( ¡ b © ¦¦© £ ) ¡ G F ¡ G ¢ ) 3 Hierarchical Dirichlet Process (HDP) We now consider modelling each row of the transition and emission matrices of an HMM as a DP. [sent-65, score-0.767]
27 The first is that we can integrate out the infinite number of transition parameters, and represent the process with a finite number of indicator variables. [sent-67, score-0.37]
28 The second is that under a DP there is a natural tendency to use existing transitions in proportion to their previous usage, which gives rise to typical trajectories. [sent-68, score-0.224]
29 2 we describe in detail the HDP model for transitions and emissions for an infinite-state HMM. [sent-71, score-0.212]
30 1 Hidden state transition mechanism £ 3b PBI $ 4 A @ 1§ Imagine we have generated a hidden state sequence up to and including time , building a table of counts for transitions that have occured so far from state to , i. [sent-73, score-1.355]
31 Given that we are in state , we impose on state a DP (5) with parameter whose counts are those entries in the row of , i. [sent-76, score-0.491]
32 Given that we have defaulted to the oracle DP, the probabilities of transitioning now become ) G H) i. [sent-82, score-0.286]
33 represented (7) 2 Under the infinite model, at any time, there are an infinite number of (indistinguishable) unrepresented states available, each of which have infinitesimal mass proportional to . [sent-86, score-0.327]
34 R a) nii + α Σ nij + β + α j self transition nij b) c) d) β Σ nij + β + α Σnij + β + α j j existing transition oracle j=i njo γ Σ n jo + γ Σ n jo + γ existing state new state j j Figure 1: (left) State transition generative mechanism. [sent-87, score-1.625]
35 (right a-d) Sampled state trajectories (time along horizontal axis) from the HDP: we give examples of four modes of , explores many states with a sparse transition matrix. [sent-88, score-0.672]
36 (c) , , has strict left-to-right transition switches between a few different states. [sent-90, score-0.232]
37 (a) £ ¡ £ ¡ 1 ¡ ¢(54¢&R; 320¨ §§§§ ¡ ¡ ¡ ¦895¦87R 6¨ 8 Under the oracle, with probability proportional to an entirely new state is transitioned to. [sent-94, score-0.275]
38 This is the only mechanism for visiting new states from the infinitely many available to us. [sent-95, score-0.208]
39 After each transition we set and, if we transitioned to the state via the . [sent-96, score-0.507]
40 If we transitioned to a new oracle DP just described then in addition we set state then the size of and will increase. [sent-97, score-0.48]
41 A ¤ 7b A 2b 1 @ 1 ¤ 2b @ 3b 9 9 9b b Self-transitions are special because their probability defines a time scale over which the dynamics of the hidden state evolves. [sent-98, score-0.446]
42 We assign a finite prior mass to self transitions for each state; this is the third hyperparameter in our model. [sent-99, score-0.347]
43 Therefore, when first visited (via in the HDP), its self-transition count is initialised to . [sent-100, score-0.148]
44 B 8 B The full hidden state transition mechanism is a two-level DP hierarchy shown in decision tree form in Figure 1. [sent-101, score-0.682]
45 Alongside are shown typical state trajectories under the prior with different hyperparameters. [sent-102, score-0.276]
46 Note that controls the expected number of represented hidden states, and influences the tendency to explore new transitions, corresponding to the size and density respectively of the resulting transition count matrix. [sent-104, score-0.718]
47 First it serves to couple the transition DPs from different hidden states. [sent-107, score-0.479]
48 Since a newly visited state has no previous transitions to existing states, without an oracle (which necessarily has knowledge of all represented states as it created them) it would transition to itself or yet another new state with probability 1. [sent-108, score-1.174]
49 By consulting the oracle, new states can have finite probability of transitioning to represented states. [sent-109, score-0.287]
50 The second role of the oracle is to allow some states to be more influential (more commonly transitioned to) than others. [sent-110, score-0.485]
51 2 Emission mechanism § PBI $ C E $ ¥ C D $ The emission process is identical to the transition process in every respect except that there is no concept analogous to a self-transition. [sent-112, score-0.712]
52 Therefore we need only introduce two further hyperparameters and for the emission HDP. [sent-113, score-0.572]
53 5 4 x 10 10 0 20 40 60 80 100 Figure 2: (left) State emission generative mechanism. [sent-117, score-0.389]
54 (right) (Exp 1) Evolution of number of represented (vertical), plotted against iterations of Gibbs sweeps (horizontal) during learning of the states ascending-descending sequence which requires exactly 10 states to model the data perfectly. [sent-120, score-0.593]
55 Each line represents initialising the hidden state to a random sequence containing distinct represented states. [sent-121, score-0.63]
56 ) ¥ 1£ £ ¡ ¦¦$$¤3£ ¦4¢¡ has been emitted using the emission oracle. [sent-123, score-0.44]
57 The combination of an HDP for both hidden states and emissions may well be able to capture the somewhat super-logarithmic word generation found in Alice. [sent-132, score-0.542]
58 4 Inference, learning and likelihoods Given a sequence of observations, there are two sets of unknowns in the infinite HMM: , and the five hyperparameters the hidden state sequence defining the transition and emission HDPs. [sent-133, score-1.411]
59 Note that by using HDPs for both states and observations, we have implicitly integrated out the infinitely many transition and emission parameters. [sent-134, score-0.794]
60 Making an analogy with non-parametric models such as Gaussian Processes, we define a learned model as a set of counts and optimised hyperparameters . [sent-135, score-0.297]
61 ¡F 8©F © 8© © B ) ¡ '$© ¦00 " ©§$ £ ) ¡9 9 %Qc © c © 3b © b ¦¡F 8 © F © 8 © © B ) ) We first describe an approximate Gibbs sampling procedure for inferring the posterior over the hidden state sequence. [sent-136, score-0.473]
62 5 $¦ ¡ - Gibbs sample given hyperparameter settings, count matrices, and observations. [sent-142, score-0.226]
63 - Update count matrices to reflect new ; this may change , the number of represented hidden states. [sent-143, score-0.472]
64 1 Gibbs sampling the hidden state sequence c 9 cb 9 b ¦$ c b Define and as the results of removing from and the transition and emission counts contributed by . [sent-150, score-1.276]
65 Define similar items and related to the transition and emission £ ¤ U G ¡ F ©0PBI0 '©! [sent-151, score-0.621]
66 An exact Gibbs sweep of the hidden state from takes operations, since under the HDP generative process changing affects the probability of all subsequent hidden state transitions and emissions. [sent-154, score-1.015]
67 3 However this computation can be reasonably approximated in , by basing the Gibbs update for only on the state of its neigbours and the total counts . [sent-155, score-0.282]
68 2 Hyperparameter optimisation ¦¡A8 © F © 8 © © B F We place vague Gamma priors5 on the hyperparameters . [sent-158, score-0.183]
69 I ) ' 0( where is the number of represented states that are transitioned to from state (includis the number of possible emissions from state . [sent-162, score-0.759]
70 and ing itself); similarly are the number of times the oracle has been used for the transition and emission processes, calculated from the indicator variables . [sent-163, score-0.884]
71 3 Infinite-state particle filter The likelihood for a particular observable sequence of symbols involves intractable sums over the possible hidden state trajectories. [sent-165, score-0.669]
72 In particular, in the DP, making the transition makes that transition more likely later on in the sequence, so we cannot use standard tricks like dynamic programming. [sent-167, score-0.464]
73 Furthermore, the number of distinct states can grow with the sequence length as new states are generated. [sent-168, score-0.53]
74 If the chain starts with distinct states, at time there could be possible distinct states making the total number of trajectories over the entire length of the sequence . [sent-169, score-0.49]
75 ' 6 4 ¤ ' ' G U ¤ 'G 6 7 A C@ 4 3 Although the hidden states in an HMM satisfy the Markov condition, integrating out the parameters induces these long-range dependencies. [sent-170, score-0.42]
76 Consider sampling parameters from the posterior distribution of parameter matrices, which will depend on the count matrices. [sent-172, score-0.159]
77 8 ¦P © IH§ § W 8 V G E 7F D CA¤9 B 8@ © r p i IqH F H a S CgRfec¡` YX R@ (¤S h G V@ d b a W G W V U T ¡ A © RQ § We propose estimating the likelihood of a test sequence given a learned model using particle filtering. [sent-175, score-0.211]
78 The idea is to start with some number of particles distributed on the represented hidden states according to the final state marginal from the training sequence (some of the may fall onto new states). [sent-176, score-0.801]
79 Update transition and emission tables , for each particle. [sent-180, score-0.621]
80 Since it is a discrete state space, with much of the probability mass concentrated on the represented states, it is feasible to use particles. [sent-194, score-0.258]
81 8 6 5 7"C U 'G 9 5 Synthetic experiments Exp 1: Discovering the number of hidden states We applied the infinite HMM inference algorithm to the ascending-descending observation sequence consisting of 30 con. [sent-195, score-0.545]
82 The most parsimonious HMM which models this catenated copies of data perfectly has exactly 10 hidden states. [sent-196, score-0.247]
83 The infinite HMM was initialised with a random hidden state sequence, containing distinct represented states. [sent-197, score-0.581]
84 In Figure 2 (right) we show how the number of represented states evolves with successive Gibbs sweeps, starting from a variety of initial . [sent-198, score-0.231]
85 ' £ ' A C E G I G E C A FSRQPHFDB@ ' Exp 2: Expansive A sequence of length was generated from a 4-state 8-symbol HMM with the transition and emission probabilities as shown in Figure 3 (top left). [sent-200, score-0.769]
86 ( ( U T £ was generated from a 4-state 3-symbol Exp 3: Compressive A sequence of length HMM with the transition and emission probabilities as shown in Figure 3 (bottom left). [sent-201, score-0.769]
87 ( ( V T In both Exp 2 and Exp 3 the infinite HMM was initialised with a hidden state sequence with distinct states. [sent-202, score-0.619]
88 Figure 3 shows that, over successive Gibbs sweeps and hyperparameter learning, the count matrices for the infinite HMM converge to resemble the true probability matrices as shown on the far left. [sent-203, score-0.481]
89 The HDP implicity integrates out the transition and emission parameters of the HMM. [sent-206, score-0.651]
90 An advantage of this is that it is no longer necessary to constrain the HMM to have finitely many states and observation symbols. [sent-207, score-0.173]
91 The prior over hidden state transitions defined by the HDP is capable of producing a wealth of interesting trajectories by varying the three hyperparameters that control it. [sent-208, score-0.867]
92 We have presented the necessary tools for using the infinite HMM, namely a linear-time approximate Gibbs sampler for inference, equations for hyperparameter learning, and a particle filter for likelihood evaluation. [sent-209, score-0.27]
93 6 Different particle initialisations apply if we do not assume that the test sequence immediately follows the training sequence. [sent-210, score-0.183]
94 True transition and emission probability matrices used for Exp 2 £ ¤¥ H £ ¤¥ ! [sent-211, score-0.687]
95 ¡ True transition and emission probability matrices used for Exp 3 ¤©! [sent-215, score-0.687]
96 ¡ H Figure 3: The far left pair of Hinton diagrams represent the true transition and emission probabilities used to generate the data for each experiment 2 and 3 (up to a permutation of the hidden states; lighter boxes correspond to higher values). [sent-219, score-0.893]
97 Similar to top row displaying count matrices after Gibbs sampling. [sent-223, score-0.208]
98 A%¡ ' & On synthetic data we have shown that the infinite HMM discovers both the appropriate number of states required to model the data and the structure of the emission and transition matrices. [sent-226, score-0.821]
99 It is important to emphasise that although the count matrices found by the infinite HMM resemble point estimates of HMM parameters (e. [sent-227, score-0.197]
100 We believe that for many problems the infinite HMM’s flexibile nature and its ability to automatically determine the required number of hidden states make it superior to the conventional treatment of HMMs with its associated difficult model selection problem. [sent-230, score-0.42]
wordName wordTfidf (topN-words)
[('emission', 0.389), ('hidden', 0.247), ('pbi', 0.235), ('transition', 0.232), ('hdp', 0.222), ('oracle', 0.205), ('hmm', 0.185), ('hyperparameters', 0.183), ('nite', 0.179), ('dp', 0.175), ('states', 0.173), ('hf', 0.169), ('state', 0.168), ('gibbs', 0.163), ('dirichlet', 0.151), ('transitions', 0.127), ('hyperparameter', 0.125), ('counts', 0.114), ('transitioned', 0.107), ('count', 0.101), ('sequence', 0.096), ('sweeps', 0.093), ('particle', 0.087), ('dps', 0.085), ('emissions', 0.085), ('proportions', 0.081), ('nitely', 0.081), ('trajectories', 0.072), ('hmms', 0.068), ('matrices', 0.066), ('miq', 0.064), ('unrepresented', 0.064), ('nij', 0.063), ('mixing', 0.061), ('distinct', 0.061), ('qc', 0.059), ('particles', 0.059), ('indicator', 0.058), ('represented', 0.058), ('indicators', 0.056), ('transitioning', 0.056), ('exp', 0.055), ('tendency', 0.054), ('integrate', 0.052), ('emitted', 0.051), ('hierarchical', 0.049), ('markov', 0.049), ('initialised', 0.047), ('sa', 0.047), ('symbols', 0.043), ('bbg', 0.043), ('compressive', 0.043), ('countably', 0.043), ('expansive', 0.043), ('hdps', 0.043), ('jo', 0.043), ('linger', 0.043), ('mqo', 0.043), ('occurence', 0.043), ('existing', 0.043), ('row', 0.041), ('symbol', 0.041), ('map', 0.039), ('modelling', 0.039), ('nonparametric', 0.037), ('alongside', 0.037), ('word', 0.037), ('prior', 0.036), ('mechanism', 0.035), ('sequences', 0.035), ('processes', 0.035), ('de', 0.035), ('wealth', 0.034), ('goto', 0.034), ('alice', 0.034), ('mass', 0.032), ('vu', 0.031), ('dynamics', 0.031), ('sampling', 0.03), ('alphabet', 0.03), ('uf', 0.03), ('resemble', 0.03), ('integrates', 0.03), ('sampler', 0.03), ('sweep', 0.03), ('inference', 0.029), ('likelihood', 0.028), ('au', 0.028), ('hh', 0.028), ('posterior', 0.028), ('bayesian', 0.028), ('process', 0.028), ('ne', 0.027), ('self', 0.027), ('leaving', 0.027), ('synthetic', 0.027), ('horizontal', 0.027), ('length', 0.027), ('explore', 0.026), ('probabilities', 0.025)]
simIndex simValue paperId paperTitle
same-paper 1 1.0000004 183 nips-2001-The Infinite Hidden Markov Model
Author: Matthew J. Beal, Zoubin Ghahramani, Carl E. Rasmussen
Abstract: We show that it is possible to extend hidden Markov models to have a countably infinite number of hidden states. By using the theory of Dirichlet processes we can implicitly integrate out the infinitely many transition parameters, leaving only three hyperparameters which can be learned from data. These three hyperparameters define a hierarchical Dirichlet process capable of capturing a rich set of transition dynamics. The three hyperparameters control the time scale of the dynamics, the sparsity of the underlying state-transition matrix, and the expected number of distinct hidden states in a finite sequence. In this framework it is also natural to allow the alphabet of emitted symbols to be infinite— consider, for example, symbols being possible words appearing in English text.
2 0.19998382 162 nips-2001-Relative Density Nets: A New Way to Combine Backpropagation with HMM's
Author: Andrew D. Brown, Geoffrey E. Hinton
Abstract: Logistic units in the first hidden layer of a feedforward neural network compute the relative probability of a data point under two Gaussians. This leads us to consider substituting other density models. We present an architecture for performing discriminative learning of Hidden Markov Models using a network of many small HMM's. Experiments on speech data show it to be superior to the standard method of discriminatively training HMM's. 1
3 0.18770547 115 nips-2001-Linear-time inference in Hierarchical HMMs
Author: Kevin P. Murphy, Mark A. Paskin
Abstract: The hierarchical hidden Markov model (HHMM) is a generalization of the hidden Markov model (HMM) that models sequences with structure at many length/time scales [FST98]. Unfortunately, the original infertime, where is ence algorithm is rather complicated, and takes the length of the sequence, making it impractical for many domains. In this paper, we show how HHMMs are a special kind of dynamic Bayesian network (DBN), and thereby derive a much simpler inference algorithm, which only takes time. Furthermore, by drawing the connection between HHMMs and DBNs, we enable the application of many standard approximation techniques to further speed up inference. ¥ ©§ £ ¨¦¥¤¢ © £ ¦¥¤¢
4 0.17367113 95 nips-2001-Infinite Mixtures of Gaussian Process Experts
Author: Carl E. Rasmussen, Zoubin Ghahramani
Abstract: We present an extension to the Mixture of Experts (ME) model, where the individual experts are Gaussian Process (GP) regression models. Using an input-dependent adaptation of the Dirichlet Process, we implement a gating network for an infinite number of Experts. Inference in this model may be done efficiently using a Markov Chain relying on Gibbs sampling. The model allows the effective covariance function to vary with the inputs, and may handle large datasets – thus potentially overcoming two of the biggest hurdles with GP models. Simulations show the viability of this approach.
5 0.15708108 43 nips-2001-Bayesian time series classification
Author: Peter Sykacek, Stephen J. Roberts
Abstract: This paper proposes an approach to classification of adjacent segments of a time series as being either of classes. We use a hierarchical model that consists of a feature extraction stage and a generative classifier which is built on top of these features. Such two stage approaches are often used in signal and image processing. The novel part of our work is that we link these stages probabilistically by using a latent feature space. To use one joint model is a Bayesian requirement, which has the advantage to fuse information according to its certainty. The classifier is implemented as hidden Markov model with Gaussian and Multinomial observation distributions defined on a suitably chosen representation of autoregressive models. The Markov dependency is motivated by the assumption that successive classifications will be correlated. Inference is done with Markov chain Monte Carlo (MCMC) techniques. We apply the proposed approach to synthetic data and to classification of EEG that was recorded while the subjects performed different cognitive tasks. All experiments show that using a latent feature space results in a significant improvement in generalization accuracy. Hence we expect that this idea generalizes well to other hierarchical models.
6 0.14145924 123 nips-2001-Modeling Temporal Structure in Classical Conditioning
7 0.12886883 7 nips-2001-A Dynamic HMM for On-line Segmentation of Sequential Data
8 0.11896734 163 nips-2001-Risk Sensitive Particle Filters
9 0.11511179 172 nips-2001-Speech Recognition using SVMs
10 0.10163124 35 nips-2001-Analysis of Sparse Bayesian Learning
11 0.10066237 39 nips-2001-Audio-Visual Sound Separation Via Hidden Markov Models
12 0.084706292 3 nips-2001-ACh, Uncertainty, and Cortical Inference
13 0.084241971 178 nips-2001-TAP Gibbs Free Energy, Belief Propagation and Sparsity
14 0.081149928 102 nips-2001-KLD-Sampling: Adaptive Particle Filters
15 0.081046738 61 nips-2001-Distribution of Mutual Information
16 0.075152799 40 nips-2001-Batch Value Function Approximation via Support Vectors
17 0.073777974 67 nips-2001-Efficient Resources Allocation for Markov Decision Processes
18 0.072903253 153 nips-2001-Product Analysis: Learning to Model Observations as Products of Hidden Variables
19 0.071912259 76 nips-2001-Fast Parameter Estimation Using Green's Functions
20 0.069561593 59 nips-2001-Direct value-approximation for factored MDPs
topicId topicWeight
[(0, -0.216), (1, -0.055), (2, 0.087), (3, -0.115), (4, -0.263), (5, -0.03), (6, 0.183), (7, 0.073), (8, -0.061), (9, -0.004), (10, 0.01), (11, 0.042), (12, 0.001), (13, -0.014), (14, -0.021), (15, -0.164), (16, 0.012), (17, -0.009), (18, 0.255), (19, -0.016), (20, 0.138), (21, 0.065), (22, -0.025), (23, -0.025), (24, 0.013), (25, 0.124), (26, -0.199), (27, -0.055), (28, -0.046), (29, 0.016), (30, 0.016), (31, -0.005), (32, 0.052), (33, 0.098), (34, -0.085), (35, -0.077), (36, -0.07), (37, -0.063), (38, 0.145), (39, 0.111), (40, 0.035), (41, -0.032), (42, 0.073), (43, -0.046), (44, -0.016), (45, -0.014), (46, 0.003), (47, -0.001), (48, -0.03), (49, -0.009)]
simIndex simValue paperId paperTitle
same-paper 1 0.97343302 183 nips-2001-The Infinite Hidden Markov Model
Author: Matthew J. Beal, Zoubin Ghahramani, Carl E. Rasmussen
Abstract: We show that it is possible to extend hidden Markov models to have a countably infinite number of hidden states. By using the theory of Dirichlet processes we can implicitly integrate out the infinitely many transition parameters, leaving only three hyperparameters which can be learned from data. These three hyperparameters define a hierarchical Dirichlet process capable of capturing a rich set of transition dynamics. The three hyperparameters control the time scale of the dynamics, the sparsity of the underlying state-transition matrix, and the expected number of distinct hidden states in a finite sequence. In this framework it is also natural to allow the alphabet of emitted symbols to be infinite— consider, for example, symbols being possible words appearing in English text.
2 0.81058365 115 nips-2001-Linear-time inference in Hierarchical HMMs
Author: Kevin P. Murphy, Mark A. Paskin
Abstract: The hierarchical hidden Markov model (HHMM) is a generalization of the hidden Markov model (HMM) that models sequences with structure at many length/time scales [FST98]. Unfortunately, the original infertime, where is ence algorithm is rather complicated, and takes the length of the sequence, making it impractical for many domains. In this paper, we show how HHMMs are a special kind of dynamic Bayesian network (DBN), and thereby derive a much simpler inference algorithm, which only takes time. Furthermore, by drawing the connection between HHMMs and DBNs, we enable the application of many standard approximation techniques to further speed up inference. ¥ ©§ £ ¨¦¥¤¢ © £ ¦¥¤¢
3 0.69778639 162 nips-2001-Relative Density Nets: A New Way to Combine Backpropagation with HMM's
Author: Andrew D. Brown, Geoffrey E. Hinton
Abstract: Logistic units in the first hidden layer of a feedforward neural network compute the relative probability of a data point under two Gaussians. This leads us to consider substituting other density models. We present an architecture for performing discriminative learning of Hidden Markov Models using a network of many small HMM's. Experiments on speech data show it to be superior to the standard method of discriminatively training HMM's. 1
4 0.62593251 7 nips-2001-A Dynamic HMM for On-line Segmentation of Sequential Data
Author: Jens Kohlmorgen, Steven Lemm
Abstract: We propose a novel method for the analysis of sequential data that exhibits an inherent mode switching. In particular, the data might be a non-stationary time series from a dynamical system that switches between multiple operating modes. Unlike other approaches, our method processes the data incrementally and without any training of internal parameters. We use an HMM with a dynamically changing number of states and an on-line variant of the Viterbi algorithm that performs an unsupervised segmentation and classification of the data on-the-fly, i.e. the method is able to process incoming data in real-time. The main idea of the approach is to track and segment changes of the probability density of the data in a sliding window on the incoming data stream. The usefulness of the algorithm is demonstrated by an application to a switching dynamical system. 1
5 0.5886901 43 nips-2001-Bayesian time series classification
Author: Peter Sykacek, Stephen J. Roberts
Abstract: This paper proposes an approach to classification of adjacent segments of a time series as being either of classes. We use a hierarchical model that consists of a feature extraction stage and a generative classifier which is built on top of these features. Such two stage approaches are often used in signal and image processing. The novel part of our work is that we link these stages probabilistically by using a latent feature space. To use one joint model is a Bayesian requirement, which has the advantage to fuse information according to its certainty. The classifier is implemented as hidden Markov model with Gaussian and Multinomial observation distributions defined on a suitably chosen representation of autoregressive models. The Markov dependency is motivated by the assumption that successive classifications will be correlated. Inference is done with Markov chain Monte Carlo (MCMC) techniques. We apply the proposed approach to synthetic data and to classification of EEG that was recorded while the subjects performed different cognitive tasks. All experiments show that using a latent feature space results in a significant improvement in generalization accuracy. Hence we expect that this idea generalizes well to other hierarchical models.
6 0.58580548 3 nips-2001-ACh, Uncertainty, and Cortical Inference
7 0.58481759 123 nips-2001-Modeling Temporal Structure in Classical Conditioning
8 0.51499957 95 nips-2001-Infinite Mixtures of Gaussian Process Experts
9 0.48270813 172 nips-2001-Speech Recognition using SVMs
10 0.42717171 148 nips-2001-Predictive Representations of State
11 0.3764115 67 nips-2001-Efficient Resources Allocation for Markov Decision Processes
12 0.35916361 163 nips-2001-Risk Sensitive Particle Filters
13 0.34893665 179 nips-2001-Tempo tracking and rhythm quantization by sequential Monte Carlo
14 0.34616986 178 nips-2001-TAP Gibbs Free Energy, Belief Propagation and Sparsity
15 0.34466752 194 nips-2001-Using Vocabulary Knowledge in Bayesian Multinomial Estimation
16 0.34342393 68 nips-2001-Entropy and Inference, Revisited
17 0.34187078 61 nips-2001-Distribution of Mutual Information
18 0.33691692 85 nips-2001-Grammar Transfer in a Second Order Recurrent Neural Network
19 0.31617042 83 nips-2001-Geometrical Singularities in the Neuromanifold of Multilayer Perceptrons
20 0.30526516 55 nips-2001-Convergence of Optimistic and Incremental Q-Learning
topicId topicWeight
[(14, 0.027), (17, 0.017), (19, 0.014), (27, 0.098), (30, 0.082), (38, 0.02), (51, 0.214), (59, 0.054), (72, 0.069), (79, 0.124), (83, 0.04), (91, 0.154)]
simIndex simValue paperId paperTitle
same-paper 1 0.8484748 183 nips-2001-The Infinite Hidden Markov Model
Author: Matthew J. Beal, Zoubin Ghahramani, Carl E. Rasmussen
Abstract: We show that it is possible to extend hidden Markov models to have a countably infinite number of hidden states. By using the theory of Dirichlet processes we can implicitly integrate out the infinitely many transition parameters, leaving only three hyperparameters which can be learned from data. These three hyperparameters define a hierarchical Dirichlet process capable of capturing a rich set of transition dynamics. The three hyperparameters control the time scale of the dynamics, the sparsity of the underlying state-transition matrix, and the expected number of distinct hidden states in a finite sequence. In this framework it is also natural to allow the alphabet of emitted symbols to be infinite— consider, for example, symbols being possible words appearing in English text.
2 0.72145188 78 nips-2001-Fragment Completion in Humans and Machines
Author: David Jacobs, Bas Rokers, Archisman Rudra, Zili Liu
Abstract: Partial information can trigger a complete memory. At the same time, human memory is not perfect. A cue can contain enough information to specify an item in memory, but fail to trigger that item. In the context of word memory, we present experiments that demonstrate some basic patterns in human memory errors. We use cues that consist of word fragments. We show that short and long cues are completed more accurately than medium length ones and study some of the factors that lead to this behavior. We then present a novel computational model that shows some of the flexibility and patterns of errors that occur in human memory. This model iterates between bottom-up and top-down computations. These are tied together using a Markov model of words that allows memory to be accessed with a simple feature set, and enables a bottom-up process to compute a probability distribution of possible completions of word fragments, in a manner similar to models of visual perceptual completion.
3 0.71697676 115 nips-2001-Linear-time inference in Hierarchical HMMs
Author: Kevin P. Murphy, Mark A. Paskin
Abstract: The hierarchical hidden Markov model (HHMM) is a generalization of the hidden Markov model (HMM) that models sequences with structure at many length/time scales [FST98]. Unfortunately, the original infertime, where is ence algorithm is rather complicated, and takes the length of the sequence, making it impractical for many domains. In this paper, we show how HHMMs are a special kind of dynamic Bayesian network (DBN), and thereby derive a much simpler inference algorithm, which only takes time. Furthermore, by drawing the connection between HHMMs and DBNs, we enable the application of many standard approximation techniques to further speed up inference. ¥ ©§ £ ¨¦¥¤¢ © £ ¦¥¤¢
4 0.70331073 162 nips-2001-Relative Density Nets: A New Way to Combine Backpropagation with HMM's
Author: Andrew D. Brown, Geoffrey E. Hinton
Abstract: Logistic units in the first hidden layer of a feedforward neural network compute the relative probability of a data point under two Gaussians. This leads us to consider substituting other density models. We present an architecture for performing discriminative learning of Hidden Markov Models using a network of many small HMM's. Experiments on speech data show it to be superior to the standard method of discriminatively training HMM's. 1
5 0.70107329 95 nips-2001-Infinite Mixtures of Gaussian Process Experts
Author: Carl E. Rasmussen, Zoubin Ghahramani
Abstract: We present an extension to the Mixture of Experts (ME) model, where the individual experts are Gaussian Process (GP) regression models. Using an input-dependent adaptation of the Dirichlet Process, we implement a gating network for an infinite number of Experts. Inference in this model may be done efficiently using a Markov Chain relying on Gibbs sampling. The model allows the effective covariance function to vary with the inputs, and may handle large datasets – thus potentially overcoming two of the biggest hurdles with GP models. Simulations show the viability of this approach.
6 0.69616741 132 nips-2001-Novel iteration schemes for the Cluster Variation Method
7 0.69364309 3 nips-2001-ACh, Uncertainty, and Cortical Inference
8 0.69353724 123 nips-2001-Modeling Temporal Structure in Classical Conditioning
9 0.69340014 7 nips-2001-A Dynamic HMM for On-line Segmentation of Sequential Data
10 0.69328451 169 nips-2001-Small-World Phenomena and the Dynamics of Information
11 0.69258392 161 nips-2001-Reinforcement Learning with Long Short-Term Memory
12 0.69143844 56 nips-2001-Convolution Kernels for Natural Language
13 0.68994379 180 nips-2001-The Concave-Convex Procedure (CCCP)
14 0.68673968 131 nips-2001-Neural Implementation of Bayesian Inference in Population Codes
15 0.68664217 160 nips-2001-Reinforcement Learning and Time Perception -- a Model of Animal Experiments
16 0.68577534 2 nips-2001-3 state neurons for contextual processing
17 0.68479311 100 nips-2001-Iterative Double Clustering for Unsupervised and Semi-Supervised Learning
18 0.6832754 194 nips-2001-Using Vocabulary Knowledge in Bayesian Multinomial Estimation
19 0.68276823 27 nips-2001-Activity Driven Adaptive Stochastic Resonance
20 0.68271899 182 nips-2001-The Fidelity of Local Ordinal Encoding