nips nips2001 nips2001-115 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Kevin P. Murphy, Mark A. Paskin
Abstract: The hierarchical hidden Markov model (HHMM) is a generalization of the hidden Markov model (HMM) that models sequences with structure at many length/time scales [FST98]. Unfortunately, the original infertime, where is ence algorithm is rather complicated, and takes the length of the sequence, making it impractical for many domains. In this paper, we show how HHMMs are a special kind of dynamic Bayesian network (DBN), and thereby derive a much simpler inference algorithm, which only takes time. Furthermore, by drawing the connection between HHMMs and DBNs, we enable the application of many standard approximation techniques to further speed up inference. ¥ ©§ £ ¨¦¥¤¢ © £ ¦¥¤¢
Reference: text
sentIndex sentText sentNum sentScore
1 edu ¡ Abstract The hierarchical hidden Markov model (HHMM) is a generalization of the hidden Markov model (HMM) that models sequences with structure at many length/time scales [FST98]. [sent-5, score-0.292]
2 Unfortunately, the original infertime, where is ence algorithm is rather complicated, and takes the length of the sequence, making it impractical for many domains. [sent-6, score-0.11]
3 In this paper, we show how HHMMs are a special kind of dynamic Bayesian network (DBN), and thereby derive a much simpler inference algorithm, which only takes time. [sent-7, score-0.257]
4 ¥ ©§ £ ¨¦¥¤¢ © £ ¦¥¤¢ 1 Introduction The Hierarchical HMM [FST98] is an extension of the HMM that is designed to model domains with hierarchical structure, e. [sent-9, score-0.134]
5 HHMMs are less expressive than stochastic context free grammars (SCFGs), since they only allows hierarchies of bounded depth, but they are more efficient and easier to learn. [sent-12, score-0.187]
6 Unfortunately, the original inference algorithm described in [FST98] is somewhat complicated, and takes time, where is the length of the sequence, is the depth of the hierarchy, and is the (maximum) number of states at each level of the hierarchy. [sent-13, score-0.416]
7 In this paper, we show how to represent an HHMM as a dynamic Bayesian network (DBN), and thereby derive a much simpler and faster inference algorithm, which takes at most time; empirically, we find it takes only time using the junction tree algorithm. [sent-14, score-0.483]
8 Furthermore, by drawing the connection between HHMMs and DBNs, we enable the application of approximate inference techniques such as belief propagation, which can perform inference in time. [sent-15, score-0.335]
9 Once the model has been learned, it will typically be used for online inference (filtering). [sent-21, score-0.188]
10 end 2 3 a end 1 end 4 5 b 6 8 x 9 end 7 c d end y ¨ ¦ ¤ ¢ ¡ ¨ ¦ ¤ ¢ §¥£ ©§¥£¡ 0 . [sent-22, score-0.605]
11 Figure 1: A 3-level hierarchical automaton representing the regular expression Solid lines represent horizontal transitions, dotted lines represent vertical transitions. [sent-23, score-0.583]
12 Letters below a production state represent the symbol that is emitted. [sent-24, score-0.406]
13 The unnumbered root node is considered level 0, and could be omitted if we fully interconnected states 0 and 1. [sent-25, score-0.233]
14 We will describe HHMMs in Section 2, and the original inference algorithm in Section 3. [sent-26, score-0.195]
15 In Section 5, we discuss how to do efficient inference in this DBN, and in Section 6, we discuss related work. [sent-28, score-0.22]
16 ¥¤¢ £ 2 Hierarchical HMMs HHMMs are like HMMs except the states of the stochastic automaton can emit single observations or strings of observations. [sent-31, score-0.492]
17 ) Those that emit single symbols are called “production states”, and those that emit strings are termed “abstract states”. [sent-34, score-0.371]
18 The strings emitted by abstract states are themselves governed by sub-HHMMs, which can be called recursively. [sent-35, score-0.118]
19 When the sub-HHMM is finished, control is returned to wherever it was called from; the calling context is memorized using a depth-limited stack. [sent-36, score-0.108]
20 We illustrate the generative process with Figure 1, which shows the state transition diagram . [sent-37, score-0.368]
21 We start of an example HHMM which models the regular expression in the root state, and make a “vertical transition” to one of its children, say state 0. [sent-38, score-0.262]
22 From here, we make another vertical transition to state 2. [sent-39, score-0.344]
23 Since state 2 is a production state, it emits “a” and then makes a “horizontal transition” to state 3. [sent-40, score-0.513]
24 State 3 calls its sub-HMM, which emits x’s and y’s until it enters its end state; then control is returned to the calling state, in this case state 3. [sent-41, score-0.523]
25 We then make a horizontal transition to state 4, emit “b”, and enter the end state, thereby returning control to state 0. [sent-42, score-0.881]
26 Finally, from state 0, we return control to the root, and optionally start again. [sent-43, score-0.212]
27 " Any HHMM can be converted to an HMM by creating a state for every possible legal stack configuration . [sent-46, score-0.237]
28 If the HHMM transition diagram is a tree, there will be one HMM state for every HHMM production state. [sent-47, score-0.507]
29 If the HHMM transition diagram has shared substructure (such as the sub-expression ), this structure must be duplicated in the HMM, generally resulting in a larger model. [sent-48, score-0.232]
30 ) 3 Cubic-time inference The inference algorithm for HHMMs presented in [FST98] runs in time and is based on the Inside-Outside algorithm [LY90], an exact inference algorithm for stochastic context-free grammars (SCFGs) which we now describe. [sent-53, score-0.735]
31 © ¨¦¥¤¢ § £ In an SCFG, sequences of observations are generated by a set of stochastic production into either rules. [sent-54, score-0.238]
32 Each production rule stochastically rewrites a non-terminal symbol a symbol of the alphabet ( ) or a pair of nonterminal symbols ( ). [sent-55, score-0.361]
33 Observation strings are generated by starting with the distinguished “start” nonterminal , and continually re-writing all non-terminals using stochastic production rules until, finally, only symbols of the alphabet remain. [sent-56, score-0.394]
34 To see why, note that we must compute Inside-Outside algorithm requires for all end points and , and for all midpoints such that generates and generates — the three degrees for freedom , and gives rise to the term. [sent-60, score-0.234]
35 " ¨2 ¢ § & $ '%# # 3 1 3 3 3 1 3 3 1 3 3 % The inference algorithm for HHMMs presented in [FST98] is based upon a straightforward adaptation of the Inside-Outside algorithm. [sent-66, score-0.195]
36 The algorithm computes in state at time by assuming that sub-state generates , that a transition to state occurs, and that generates . [sent-67, score-0.616]
37 Hence the algorithm takes time, where is the total number of states. [sent-68, score-0.079]
38 © " ¢ # 1 3 3 5© 4 5¡ 64 % 7 £ ©§ £ "¥ ¤¢ We can always “flatten” an HHMM into a regular HMM and hence do inference in . [sent-73, score-0.195]
39 Unfortunately, this flat model cannot represent the hierarchical structure, yet alone learn it. [sent-74, score-0.191]
40 In the next section, we show how to represent the HHMM as a DBN, and thereby get the best of both worlds: low time complexity without losing hierarchical structure. [sent-75, score-0.292]
41 © £ ¥ ¤¢ 4 Representing the HHMM as a DBN We can represent the HHMM as a dynamic Bayesian network (DBN) as shown in Figure 2. [sent-76, score-0.086]
42 (We assume for simplicity that all production states are at the bottom of the hierarchy; this restriction is lifted in the full version of this paper. [sent-77, score-0.217]
43 ) The state of the HMM at level and time is represented by . [sent-78, score-0.274]
44 The state of the whole HHMM is encoded by the vector ; intuitively, this encodes the contents of the stack, that specifies the complete “path” to take from the root to the leaf state. [sent-79, score-0.221]
45 0 5 8 3 # © @¨ £ 9 3 3 3 5 A is an indicator variable that is “on” if the HMM at level and time has just “finished” (i. [sent-80, score-0.112]
46 , is about to enter an end state), otherwise it is off. [sent-82, score-0.18]
47 Shaded nodes are observed; HMM at level has finished (entered its exit state), otherwise the remaining nodes are hidden. [sent-85, score-0.362]
48 We may optionally clamp , where is the length of the observation sequence, to ensure all models have finished by the end of the sequence. [sent-86, score-0.202]
49 ( ; hence the number of nodes that are “off” represents the effective height for all of the “context stack”, i. [sent-89, score-0.143]
50 , which level of the hierarchy we are currently on. [sent-91, score-0.126]
51 0 0 The downward going arcs between the variables represent the fact that a state “calls” a sub-state. [sent-92, score-0.312]
52 The upward going arcs between the variables enforce the fact that a higherlevel HMM can only change state when the lower-level one is finished. [sent-93, score-0.298]
53 This ensures proper nesting of the parse trees, and is the key difference between an HHMM and a hidden Markov decision tree [JGS96]. [sent-94, score-0.194]
54 We consider the bottom, middle and top layers of the hierarchy separately (since they have different local topology), as well as the first, middle and last time slices. [sent-96, score-0.086]
55 1 Definition of the CPDs follows a Markov chain with parameters Consider the bottom level of the hierarchy. [sent-98, score-0.139]
56 determined by its position in the automaton, which is encoded by the vector of higher-up state variables , which we will represent by the integer . [sent-99, score-0.219]
57 A enters its end state, it will “turn on” , to mean it is finished; this will be a signal that higher-level HMMs can now change state. [sent-103, score-0.191]
58 In addition, it will be a signal that the next value of should be drawn from its prior distribution (representing a vertical transition), instead of its transition matrix (representing a horizontal transition). [sent-104, score-0.255]
59 is the transition matrix for level given that the parent variables are in state , and is just a rescaled version of . [sent-106, score-0.469]
60 Similarly, is the initial distribution for level given that the parent variables are in state . [sent-107, score-0.326]
61 The equation for is simply © 1 23 3 3 % A 5 A 5 A "5 5 © ¢ £ © © © 7 " B A £ ¢ end Now consider the intermediate levels. [sent-108, score-0.15]
62 As before, follows a Markov chain with parameters determined by , and specifies whether we should use the transition matrix or the prior. [sent-109, score-0.176]
63 © 1 23 3 5 A " " 5 A £ 5 5 £ ¢ 3 3 # 3 % should “turn on” only if is “allowed” to enter a final state, the probability of which depends on the current context . [sent-112, score-0.092]
64 Formally, we can write this as follows: 1 2 B # # if if 3 # end 5 B 5 A £ ¢ 1 3 3 3 % The top level differs from the intermediate levels in that the node has no parent to specify which distribution to use. [sent-113, score-0.364]
65 (Equivalently, we can imagine a dummy top layer HMM, which is always in state 1: . [sent-115, score-0.162]
66 This is often how HHMMs are represented, so that this top-level state is the root of the overall parse tree, as in Figure 1. [sent-116, score-0.253]
67 ) B 7 " 5 1 23 3 © ¢£ 9 ¨ © £ © " £ ¡ The CPDs for the nodes in the first slice are as follows: level and , for for the top . [sent-117, score-0.328]
68 © £ 5© © 7 " 5 # 5 £ 1 2 0 % If the observations are discrete symbols, we may represent as a multinomial (i. [sent-118, score-0.123]
69 3 % 3 3 () '¨ & §¥ ¤ § % 9 $ Unlike the automaton representation, the DBN never actually enters an end state (i. [sent-123, score-0.503]
70 , can never taken on the value “end”), because if it did, it would not be able to emit the symbol . [sent-125, score-0.173]
71 Instead, causes to turn on, and then enters a new (non-terminal) state at time . [sent-126, score-0.268]
72 The equations holds because the probability of each horizontal transition in the DBN gets multiplied by the probability that , which is ; this product should match the original probability. [sent-128, score-0.216]
73 2 Parsimonious representations of the CPDs © 7 " 5 " 8 5 £ 5 The number of parameters needed to represent as a multinomial is . [sent-131, score-0.083]
74 If the state-transition diagram of the hierarchical automaton is sparse, many of the entries in this table will be 0. [sent-132, score-0.347]
75 There are at least three possibilities: decision trees [BFGK96], softmax as a mixture of smaller transition matrices nodes, or representing at different depths c. [sent-134, score-0.264]
76 1 23 3 % 3 © 5 ¤¢ £ © 7 " 5 " 5 8 £ 5 1 23 3 % 3 5 Linear-time inference ¡ ¥ ©¢£ £ ¤¡ ¢ ¡ We define inference to be computing for all sets of nodes parents in the DBN. [sent-138, score-0.481]
77 The simplest way to do this is to merge all the hidden nodes in each slice into a single “mega node”, , with possible values. [sent-140, score-0.331]
78 ) We can then apply the forwards-backwards algorithm for HMMs, which takes time. [sent-142, score-0.079]
79 (Even storing the transition matrix is likely to consume too much space. [sent-144, score-0.143]
80 In [Mur01], we present a way of applying the junction tree (jtree) algorithm to variable-length DBNs; we give a brief sketch here. [sent-146, score-0.136]
81 Each jtree is formed from a “ -slice DBN”; this is a DBN that contains all the nodes in slice 1 but only the interface nodes from slice 2. [sent-148, score-0.674]
82 The interface nodes are those nodes in slice 2 that have an incoming temporal arc, plus parents of nodes that have incoming temporal arcs. [sent-149, score-0.613]
83 ¥ ¦ ¨§¥ B The cost of doing inference in each jtree depends on the sizes of the cliques. [sent-151, score-0.279]
84 Minimizing the maximal clique size is NP-hard, so we used a standard one-step look-ahead (greedy) algorithm [Kja90]. [sent-152, score-0.076]
85 Let be the number of nodes in clique , let be the number of nodes, and let be the number of cliques. [sent-154, score-0.178]
86 Then the cost of inference in a jtree is proportional to © # £ ' ! [sent-155, score-0.279]
87 Hence a crude upper bound on the cost of inference in each jtree is , yielding an overall time and space complexity of . [sent-163, score-0.344]
88 We remind readers that the original algorithm has time complexity, since there can be up to states in the HHMM. [sent-164, score-0.125]
89 ¥¤¢ 1' £ running time (seconds) vs sequence length 40 35 30 25 20 15 10 5 0 10 linear cubic 15 20 25 30 35 40 Figure 3: Running time vs. [sent-173, score-0.13]
90 6 Related work Hidden Markov decision trees (HMDT) [JGS96] are DBNs with a structure similar to Figure 2, but they lack the nodes and the upward going arcs; hence they are not able to represent the call-return semantics of the HHMM. [sent-182, score-0.34]
91 ) A variable-duration HMM [Rab89] is a special case of a 2-level HHMM, where the bottom level counts how long we have been in a certain state; when the counter expires, the node turns on, and the parent can change state. [sent-187, score-0.244]
92 An AHMM is equivalent to an HHMM if we consider to represent the (abstract) policy being followed at level and time ; represents the concrete action, which causes the observation. [sent-190, score-0.169]
93 We also need to add a hidden global node, all the nodes and all the nodes. [sent-191, score-0.222]
94 state variable , which is a parent of the ( is hidden to us as observers, but not to the agent performing the actions. [sent-192, score-0.329]
95 ) In addition, they assume that only depends on its immediate parent, , but not its whole context, , so the nodes become connected by a chain. [sent-197, score-0.143]
96 This enables them to use Rao-Blackwellized particle filtering for approximate online inference: conditioned on the nodes, the distribution over the nodes can be represented as a product of marginals, so they can be efficiently marginalized out. [sent-198, score-0.177]
97 Recognition of visual activities and interactions by stochastic parsing. [sent-249, score-0.085]
98 Triangulation of graphs – algorithms giving small total state space. [sent-262, score-0.162]
99 The estimation of stochastic context-free grammars using the Inside-Outside algorithm. [sent-273, score-0.114]
100 Mixed memory markov models: Decomposing complex stochastic processes as mixture of simpler ones. [sent-322, score-0.175]
wordName wordTfidf (topN-words)
[('hhmm', 0.525), ('hhmms', 0.3), ('dbn', 0.297), ('hmm', 0.184), ('state', 0.162), ('inference', 0.154), ('automaton', 0.15), ('nodes', 0.143), ('transition', 0.143), ('production', 0.139), ('hierarchical', 0.134), ('cpds', 0.125), ('emit', 0.125), ('jtree', 0.125), ('end', 0.121), ('nished', 0.119), ('slice', 0.109), ('parent', 0.088), ('dbns', 0.087), ('markov', 0.086), ('hmms', 0.079), ('hidden', 0.079), ('level', 0.076), ('stack', 0.075), ('horizontal', 0.073), ('enters', 0.07), ('strings', 0.07), ('diagram', 0.063), ('enter', 0.059), ('root', 0.059), ('stochastic', 0.059), ('represent', 0.057), ('arcs', 0.055), ('grammars', 0.055), ('tree', 0.054), ('berkeley', 0.051), ('symbols', 0.051), ('hierarchy', 0.05), ('ahmm', 0.05), ('bui', 0.05), ('emits', 0.05), ('optionally', 0.05), ('scfgs', 0.05), ('node', 0.05), ('states', 0.048), ('symbol', 0.048), ('interface', 0.045), ('calls', 0.045), ('upward', 0.043), ('venkatesh', 0.043), ('algorithm', 0.041), ('regular', 0.041), ('junction', 0.041), ('observations', 0.04), ('nonterminal', 0.04), ('frontier', 0.04), ('calling', 0.04), ('hierarchies', 0.04), ('vertical', 0.039), ('takes', 0.038), ('going', 0.038), ('generates', 0.036), ('thereby', 0.036), ('time', 0.036), ('unfortunately', 0.035), ('alphabet', 0.035), ('returned', 0.035), ('clique', 0.035), ('ef', 0.034), ('online', 0.034), ('chain', 0.033), ('murphy', 0.033), ('context', 0.033), ('discuss', 0.033), ('representing', 0.032), ('parse', 0.032), ('parsimonious', 0.032), ('uai', 0.032), ('recognition', 0.032), ('length', 0.031), ('trees', 0.03), ('parents', 0.03), ('mixture', 0.03), ('bottom', 0.03), ('complexity', 0.029), ('intermediate', 0.029), ('topology', 0.029), ('factored', 0.029), ('marginals', 0.029), ('dynamic', 0.029), ('decision', 0.029), ('depth', 0.028), ('bayesian', 0.027), ('drawing', 0.027), ('ltering', 0.027), ('sequence', 0.027), ('activities', 0.026), ('conditioning', 0.026), ('shared', 0.026), ('multinomial', 0.026)]
simIndex simValue paperId paperTitle
same-paper 1 0.9999997 115 nips-2001-Linear-time inference in Hierarchical HMMs
Author: Kevin P. Murphy, Mark A. Paskin
Abstract: The hierarchical hidden Markov model (HHMM) is a generalization of the hidden Markov model (HMM) that models sequences with structure at many length/time scales [FST98]. Unfortunately, the original infertime, where is ence algorithm is rather complicated, and takes the length of the sequence, making it impractical for many domains. In this paper, we show how HHMMs are a special kind of dynamic Bayesian network (DBN), and thereby derive a much simpler inference algorithm, which only takes time. Furthermore, by drawing the connection between HHMMs and DBNs, we enable the application of many standard approximation techniques to further speed up inference. ¥ ©§ £ ¨¦¥¤¢ © £ ¦¥¤¢
2 0.18770547 183 nips-2001-The Infinite Hidden Markov Model
Author: Matthew J. Beal, Zoubin Ghahramani, Carl E. Rasmussen
Abstract: We show that it is possible to extend hidden Markov models to have a countably infinite number of hidden states. By using the theory of Dirichlet processes we can implicitly integrate out the infinitely many transition parameters, leaving only three hyperparameters which can be learned from data. These three hyperparameters define a hierarchical Dirichlet process capable of capturing a rich set of transition dynamics. The three hyperparameters control the time scale of the dynamics, the sparsity of the underlying state-transition matrix, and the expected number of distinct hidden states in a finite sequence. In this framework it is also natural to allow the alphabet of emitted symbols to be infinite— consider, for example, symbols being possible words appearing in English text.
3 0.17025758 162 nips-2001-Relative Density Nets: A New Way to Combine Backpropagation with HMM's
Author: Andrew D. Brown, Geoffrey E. Hinton
Abstract: Logistic units in the first hidden layer of a feedforward neural network compute the relative probability of a data point under two Gaussians. This leads us to consider substituting other density models. We present an architecture for performing discriminative learning of Hidden Markov Models using a network of many small HMM's. Experiments on speech data show it to be superior to the standard method of discriminatively training HMM's. 1
4 0.11219033 7 nips-2001-A Dynamic HMM for On-line Segmentation of Sequential Data
Author: Jens Kohlmorgen, Steven Lemm
Abstract: We propose a novel method for the analysis of sequential data that exhibits an inherent mode switching. In particular, the data might be a non-stationary time series from a dynamical system that switches between multiple operating modes. Unlike other approaches, our method processes the data incrementally and without any training of internal parameters. We use an HMM with a dynamically changing number of states and an on-line variant of the Viterbi algorithm that performs an unsupervised segmentation and classification of the data on-the-fly, i.e. the method is able to process incoming data in real-time. The main idea of the approach is to track and segment changes of the probability density of the data in a sliding window on the incoming data stream. The usefulness of the algorithm is demonstrated by an application to a switching dynamical system. 1
5 0.11093906 59 nips-2001-Direct value-approximation for factored MDPs
Author: Dale Schuurmans, Relu Patrascu
Abstract: We present a simple approach for computing reasonable policies for factored Markov decision processes (MDPs), when the optimal value function can be approximated by a compact linear form. Our method is based on solving a single linear program that approximates the best linear fit to the optimal value function. By applying an efficient constraint generation procedure we obtain an iterative solution method that tackles concise linear programs. This direct linear programming approach experimentally yields a significant reduction in computation time over approximate value- and policy-iteration methods (sometimes reducing several hours to a few seconds). However, the quality of the solutions produced by linear programming is weaker-usually about twice the approximation error for the same approximating class. Nevertheless, the speed advantage allows one to use larger approximation classes to achieve similar error in reasonable time. 1
6 0.10954519 172 nips-2001-Speech Recognition using SVMs
7 0.10572093 123 nips-2001-Modeling Temporal Structure in Classical Conditioning
8 0.10518356 43 nips-2001-Bayesian time series classification
9 0.10025846 39 nips-2001-Audio-Visual Sound Separation Via Hidden Markov Models
10 0.099640474 128 nips-2001-Multiagent Planning with Factored MDPs
11 0.096927106 190 nips-2001-Thin Junction Trees
12 0.094692074 3 nips-2001-ACh, Uncertainty, and Cortical Inference
13 0.087386675 149 nips-2001-Probabilistic Abstraction Hierarchies
14 0.085341811 56 nips-2001-Convolution Kernels for Natural Language
15 0.084876053 188 nips-2001-The Unified Propagation and Scaling Algorithm
16 0.077192448 169 nips-2001-Small-World Phenomena and the Dynamics of Information
17 0.076098517 118 nips-2001-Matching Free Trees with Replicator Equations
18 0.074426949 63 nips-2001-Dynamic Time-Alignment Kernel in Support Vector Machine
19 0.069051005 85 nips-2001-Grammar Transfer in a Second Order Recurrent Neural Network
20 0.067351207 157 nips-2001-Rates of Convergence of Performance Gradient Estimates Using Function Approximation and Bias in Reinforcement Learning
topicId topicWeight
[(0, -0.208), (1, -0.063), (2, 0.111), (3, -0.065), (4, -0.226), (5, -0.047), (6, 0.053), (7, -0.037), (8, -0.137), (9, 0.008), (10, -0.078), (11, -0.007), (12, 0.05), (13, 0.136), (14, -0.026), (15, -0.133), (16, -0.117), (17, 0.051), (18, 0.182), (19, -0.039), (20, 0.133), (21, 0.084), (22, -0.048), (23, -0.027), (24, -0.079), (25, 0.157), (26, -0.102), (27, -0.003), (28, -0.014), (29, -0.004), (30, 0.007), (31, -0.009), (32, 0.03), (33, 0.016), (34, -0.048), (35, -0.129), (36, -0.087), (37, -0.112), (38, 0.064), (39, 0.004), (40, 0.013), (41, -0.008), (42, 0.046), (43, -0.054), (44, -0.043), (45, -0.021), (46, 0.016), (47, 0.002), (48, 0.02), (49, -0.067)]
simIndex simValue paperId paperTitle
same-paper 1 0.96244907 115 nips-2001-Linear-time inference in Hierarchical HMMs
Author: Kevin P. Murphy, Mark A. Paskin
Abstract: The hierarchical hidden Markov model (HHMM) is a generalization of the hidden Markov model (HMM) that models sequences with structure at many length/time scales [FST98]. Unfortunately, the original infertime, where is ence algorithm is rather complicated, and takes the length of the sequence, making it impractical for many domains. In this paper, we show how HHMMs are a special kind of dynamic Bayesian network (DBN), and thereby derive a much simpler inference algorithm, which only takes time. Furthermore, by drawing the connection between HHMMs and DBNs, we enable the application of many standard approximation techniques to further speed up inference. ¥ ©§ £ ¨¦¥¤¢ © £ ¦¥¤¢
2 0.8257795 183 nips-2001-The Infinite Hidden Markov Model
Author: Matthew J. Beal, Zoubin Ghahramani, Carl E. Rasmussen
Abstract: We show that it is possible to extend hidden Markov models to have a countably infinite number of hidden states. By using the theory of Dirichlet processes we can implicitly integrate out the infinitely many transition parameters, leaving only three hyperparameters which can be learned from data. These three hyperparameters define a hierarchical Dirichlet process capable of capturing a rich set of transition dynamics. The three hyperparameters control the time scale of the dynamics, the sparsity of the underlying state-transition matrix, and the expected number of distinct hidden states in a finite sequence. In this framework it is also natural to allow the alphabet of emitted symbols to be infinite— consider, for example, symbols being possible words appearing in English text.
3 0.71633548 162 nips-2001-Relative Density Nets: A New Way to Combine Backpropagation with HMM's
Author: Andrew D. Brown, Geoffrey E. Hinton
Abstract: Logistic units in the first hidden layer of a feedforward neural network compute the relative probability of a data point under two Gaussians. This leads us to consider substituting other density models. We present an architecture for performing discriminative learning of Hidden Markov Models using a network of many small HMM's. Experiments on speech data show it to be superior to the standard method of discriminatively training HMM's. 1
4 0.62405896 3 nips-2001-ACh, Uncertainty, and Cortical Inference
Author: Peter Dayan, Angela J. Yu
Abstract: Acetylcholine (ACh) has been implicated in a wide variety of tasks involving attentional processes and plasticity. Following extensive animal studies, it has previously been suggested that ACh reports on uncertainty and controls hippocampal, cortical and cortico-amygdalar plasticity. We extend this view and consider its effects on cortical representational inference, arguing that ACh controls the balance between bottom-up inference, influenced by input stimuli, and top-down inference, influenced by contextual information. We illustrate our proposal using a hierarchical hidden Markov model.
5 0.60084969 7 nips-2001-A Dynamic HMM for On-line Segmentation of Sequential Data
Author: Jens Kohlmorgen, Steven Lemm
Abstract: We propose a novel method for the analysis of sequential data that exhibits an inherent mode switching. In particular, the data might be a non-stationary time series from a dynamical system that switches between multiple operating modes. Unlike other approaches, our method processes the data incrementally and without any training of internal parameters. We use an HMM with a dynamically changing number of states and an on-line variant of the Viterbi algorithm that performs an unsupervised segmentation and classification of the data on-the-fly, i.e. the method is able to process incoming data in real-time. The main idea of the approach is to track and segment changes of the probability density of the data in a sliding window on the incoming data stream. The usefulness of the algorithm is demonstrated by an application to a switching dynamical system. 1
6 0.55209076 123 nips-2001-Modeling Temporal Structure in Classical Conditioning
7 0.49033174 172 nips-2001-Speech Recognition using SVMs
8 0.46675372 43 nips-2001-Bayesian time series classification
9 0.44878307 148 nips-2001-Predictive Representations of State
10 0.44783923 149 nips-2001-Probabilistic Abstraction Hierarchies
11 0.4430193 169 nips-2001-Small-World Phenomena and the Dynamics of Information
12 0.40946913 128 nips-2001-Multiagent Planning with Factored MDPs
13 0.38606921 59 nips-2001-Direct value-approximation for factored MDPs
14 0.38164654 118 nips-2001-Matching Free Trees with Replicator Equations
15 0.37959316 85 nips-2001-Grammar Transfer in a Second Order Recurrent Neural Network
16 0.36489302 56 nips-2001-Convolution Kernels for Natural Language
17 0.36400989 190 nips-2001-Thin Junction Trees
18 0.35207805 36 nips-2001-Approximate Dynamic Programming via Linear Programming
19 0.34358588 39 nips-2001-Audio-Visual Sound Separation Via Hidden Markov Models
20 0.33924949 67 nips-2001-Efficient Resources Allocation for Markov Decision Processes
topicId topicWeight
[(14, 0.021), (17, 0.013), (19, 0.018), (27, 0.082), (30, 0.08), (38, 0.017), (59, 0.029), (72, 0.038), (79, 0.463), (83, 0.029), (88, 0.014), (91, 0.106)]
simIndex simValue paperId paperTitle
1 0.94670343 35 nips-2001-Analysis of Sparse Bayesian Learning
Author: Anita C. Faul, Michael E. Tipping
Abstract: The recent introduction of the 'relevance vector machine' has effectively demonstrated how sparsity may be obtained in generalised linear models within a Bayesian framework. Using a particular form of Gaussian parameter prior, 'learning' is the maximisation, with respect to hyperparameters, of the marginal likelihood of the data. This paper studies the properties of that objective function, and demonstrates that conditioned on an individual hyperparameter, the marginal likelihood has a unique maximum which is computable in closed form. It is further shown that if a derived 'sparsity criterion' is satisfied, this maximum is exactly equivalent to 'pruning' the corresponding parameter from the model. 1
2 0.91774273 2 nips-2001-3 state neurons for contextual processing
Author: Ádám Kepecs, S. Raghavachari
Abstract: Neurons receive excitatory inputs via both fast AMPA and slow NMDA type receptors. We find that neurons receiving input via NMDA receptors can have two stable membrane states which are input dependent. Action potentials can only be initiated from the higher voltage state. Similar observations have been made in several brain areas which might be explained by our model. The interactions between the two kinds of inputs lead us to suggest that some neurons may operate in 3 states: disabled, enabled and firing. Such enabled, but non-firing modes can be used to introduce context-dependent processing in neural networks. We provide a simple example and discuss possible implications for neuronal processing and response variability. 1
same-paper 3 0.91643029 115 nips-2001-Linear-time inference in Hierarchical HMMs
Author: Kevin P. Murphy, Mark A. Paskin
Abstract: The hierarchical hidden Markov model (HHMM) is a generalization of the hidden Markov model (HMM) that models sequences with structure at many length/time scales [FST98]. Unfortunately, the original infertime, where is ence algorithm is rather complicated, and takes the length of the sequence, making it impractical for many domains. In this paper, we show how HHMMs are a special kind of dynamic Bayesian network (DBN), and thereby derive a much simpler inference algorithm, which only takes time. Furthermore, by drawing the connection between HHMMs and DBNs, we enable the application of many standard approximation techniques to further speed up inference. ¥ ©§ £ ¨¦¥¤¢ © £ ¦¥¤¢
4 0.88879269 78 nips-2001-Fragment Completion in Humans and Machines
Author: David Jacobs, Bas Rokers, Archisman Rudra, Zili Liu
Abstract: Partial information can trigger a complete memory. At the same time, human memory is not perfect. A cue can contain enough information to specify an item in memory, but fail to trigger that item. In the context of word memory, we present experiments that demonstrate some basic patterns in human memory errors. We use cues that consist of word fragments. We show that short and long cues are completed more accurately than medium length ones and study some of the factors that lead to this behavior. We then present a novel computational model that shows some of the flexibility and patterns of errors that occur in human memory. This model iterates between bottom-up and top-down computations. These are tied together using a Markov model of words that allows memory to be accessed with a simple feature set, and enables a bottom-up process to compute a probability distribution of possible completions of word fragments, in a manner similar to models of visual perceptual completion.
5 0.88401234 180 nips-2001-The Concave-Convex Procedure (CCCP)
Author: Alan L. Yuille, Anand Rangarajan
Abstract: We introduce the Concave-Convex procedure (CCCP) which constructs discrete time iterative dynamical systems which are guaranteed to monotonically decrease global optimization/energy functions. It can be applied to (almost) any optimization problem and many existing algorithms can be interpreted in terms of CCCP. In particular, we prove relationships to some applications of Legendre transform techniques. We then illustrate CCCP by applications to Potts models, linear assignment, EM algorithms, and Generalized Iterative Scaling (GIS). CCCP can be used both as a new way to understand existing optimization algorithms and as a procedure for generating new algorithms. 1
6 0.59714007 183 nips-2001-The Infinite Hidden Markov Model
7 0.54269969 118 nips-2001-Matching Free Trees with Replicator Equations
8 0.52778131 162 nips-2001-Relative Density Nets: A New Way to Combine Backpropagation with HMM's
9 0.52634853 184 nips-2001-The Intelligent surfer: Probabilistic Combination of Link and Content Information in PageRank
10 0.52328229 86 nips-2001-Grammatical Bigrams
11 0.5179581 169 nips-2001-Small-World Phenomena and the Dynamics of Information
12 0.51469469 3 nips-2001-ACh, Uncertainty, and Cortical Inference
13 0.51416057 172 nips-2001-Speech Recognition using SVMs
14 0.49976051 192 nips-2001-Tree-based reparameterization for approximate inference on loopy graphs
15 0.4954626 123 nips-2001-Modeling Temporal Structure in Classical Conditioning
16 0.49075335 194 nips-2001-Using Vocabulary Knowledge in Bayesian Multinomial Estimation
17 0.48978192 171 nips-2001-Spectral Relaxation for K-means Clustering
18 0.48945916 132 nips-2001-Novel iteration schemes for the Cluster Variation Method
19 0.48742518 27 nips-2001-Activity Driven Adaptive Stochastic Resonance
20 0.48487151 188 nips-2001-The Unified Propagation and Scaling Algorithm